Affinity Propagation Clustering Algorithm based on Spark Platform
- DOI
- 10.2991/wartia-16.2016.107How to use a DOI?
- Keywords
- Affinity propagation, Resilient Distributed Datasets, Spark, Large scale dataset.
- Abstract
With the explosive growing of data, there are challenges to deal with the large scale complex data. Many clustering algorithms have been proposed. Such as Affinity Propagation (AP) clustering Algorithm, AP takes similarity between pairs of data point as input measures. AP is a fast and efficient clustering algorithm for large dataset compared with the existing clustering algorithm. As the scale of data grows more explosively, the time efficiency of AP algorithm cannot be satisfied. Therefore, AP clustering algorithm based on Spark platform (Spark-AP) is proposed in this paper. Firstly, a dataset is partitioned into several Resilient Distributed Datasets (RDD) on a strategy and select the exemplars of each RDD. Then exemplars are merged and are used to next AP clustering algorithm, which forms a set of high-quality exemplars after convergence. Experiments show that Spark-AP performs better both in processing scale and processing time.
- Copyright
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Lijia Zhang AU - Lianglun Cheng PY - 2016/05 DA - 2016/05 TI - Affinity Propagation Clustering Algorithm based on Spark Platform BT - Proceedings of the 2016 2nd Workshop on Advanced Research and Technology in Industry Applications PB - Atlantis Press SP - 530 EP - 533 SN - 2352-5401 UR - https://doi.org/10.2991/wartia-16.2016.107 DO - 10.2991/wartia-16.2016.107 ID - Zhang2016/05 ER -