Paralleled Fast Search and Find of Density Peaks Clustering Algorithm on GPUs with CUDA
- DOI
- 10.2991/ijndc.2016.4.3.4How to use a DOI?
- Keywords
- Clustering; FSFDP; CUDA; Shared memory; Stream; GPU clusters.
- Abstract
Fast Search and Find of Density Peaks (FSFDP) is a newly proposed clustering algorithm that has already been successfully applied in many applications. However, this algorithm shows a dissatisfactory performance on large dataset due to the time-consuming calculation of the distance matrix and potentials. In this paper, we proposed a GPU-accelerated FSFDP with CUDA to improve its performance. Thread/block models and the shared memory usage are dedicatedly designed to maximize the utilization of GPUs’ hardware resources, and a merge accumulation algorithm based on the odd and even positions of an array is introduced as well. Experimental results show that our parallel implementation of FSFDP can reach a 4.39X and a 15.75X speedup for the calculation of the distance matrix and potentials respectively compared to the serial program on a single CPU core. Higher speedup can be expected for data of larger scales until the device limits are reached. Besides, CUDA stream mechanism is also employed and extra time savings can be obtained by hiding the corresponding memory latency of multiple kernels in a two-way streams’ scheduling. Moreover, we evaluate our GPU-based implementation on GPU clusters of 9 nodes and compared to one GPU node, the program can achieve a further 7.55X speedup.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - JOUR AU - Mi Li AU - Jie Huang AU - Jingpeng Wang PY - 2016 DA - 2016/07/01 TI - Paralleled Fast Search and Find of Density Peaks Clustering Algorithm on GPUs with CUDA JO - International Journal of Networked and Distributed Computing SP - 173 EP - 181 VL - 4 IS - 3 SN - 2211-7946 UR - https://doi.org/10.2991/ijndc.2016.4.3.4 DO - 10.2991/ijndc.2016.4.3.4 ID - Li2016 ER -