Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm

Ayaz ul Hassan Khan Khan; Mayez Al-Mouhamed; Allam Fatayer; Anas Almousa; Abdulrahman Baqais; Mohammed Assayony

doi:10.2991/ijndc.2014.2.3.2

<Previous Article In Issue

Next Article In Issue>

Volume 2, Issue 3, August 2014, Pages 124 - 134

Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm

Authors

Ayaz ul Hassan Khan Khan, Mayez Al-Mouhamed, Allam Fatayer, Anas Almousa, Abdulrahman Baqais, Mohammed Assayony

Corresponding Author

Ayaz ul Hassan Khan Khan

Available Online 1 August 2014.

DOI: 10.2991/ijndc.2014.2.3.2 How to use a DOI?
Keywords: Bank conflict free, coalesced memory access, CUDA, GPU, matrix transpose, linear Algebra solvers, solving system of linear equations.
Abstract: The advances of Graphic Processing Units (GPU) technology and the introduction of CUDA programming model facilitates developing new solutions for sparse and dense linear algebra solvers. Matrix Transpose is an important linear algebra procedure that has deep impact in various computational science and engineering applications. Several factors hinder the expected performance of large matrix transpose on GPU devices. The degradation in performance involves the memory access pattern such as coalesced access in the global memory and bank conflict in the shared memory of streaming multiprocessors within the GPU. In this paper, two matrix transpose algorithms are proposed to alleviate the aforementioned issues of ensuring coalesced access and conflict free bank access. The proposed algorithms have comparable execution times with the NVIDIA SDK bank conflict - free matrix transpose implementation. The main advantage of proposed algorithms is that they eliminate bank conflicts while allocating shared memory exactly equal to the tile size (T x T) of the problem space. However, to the best of our knowledge an extra space of Tx(T+1) needs to be allocated in the published research. We have also applied the proposed transpose algorithm to recursive gaussian implementation of NVIDIA SDK and achieved about 6% improvement in performance.
Copyright: © 2017, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Issue

Next Article In Issue>

Journal: International Journal of Networked and Distributed Computing
Volume-Issue: 2 - 3
Pages: 124 - 134
Publication Date: 2014/08/01
ISSN (Online): 2211-7946
ISSN (Print): 2211-7938
DOI: 10.2991/ijndc.2014.2.3.2 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - JOUR
AU  - Ayaz ul Hassan Khan Khan
AU  - Mayez Al-Mouhamed
AU  - Allam Fatayer
AU  - Anas Almousa
AU  - Abdulrahman Baqais
AU  - Mohammed Assayony
PY  - 2014
DA  - 2014/08/01
TI  - Padding Free Bank Conflict Resolution for CUDA-Based Matrix Transpose Algorithm
JO  - International Journal of Networked and Distributed Computing
SP  - 124
EP  - 134
VL  - 2
IS  - 3
SN  - 2211-7946
UR  - https://doi.org/10.2991/ijndc.2014.2.3.2
DO  - 10.2991/ijndc.2014.2.3.2
ID  - Khan2014
ER  -

download .riscopy to clipboard