Proceedings of the 2016 2nd Workshop on Advanced Research and Technology in Industry Applications

Affinity Propagation Clustering Algorithm based on Spark Platform

Authors
Lijia Zhang, Lianglun Cheng
Corresponding Author
Lijia Zhang
Available Online May 2016.
DOI
10.2991/wartia-16.2016.107How to use a DOI?
Keywords
Affinity propagation, Resilient Distributed Datasets, Spark, Large scale dataset.
Abstract

With the explosive growing of data, there are challenges to deal with the large scale complex data. Many clustering algorithms have been proposed. Such as Affinity Propagation (AP) clustering Algorithm, AP takes similarity between pairs of data point as input measures. AP is a fast and efficient clustering algorithm for large dataset compared with the existing clustering algorithm. As the scale of data grows more explosively, the time efficiency of AP algorithm cannot be satisfied. Therefore, AP clustering algorithm based on Spark platform (Spark-AP) is proposed in this paper. Firstly, a dataset is partitioned into several Resilient Distributed Datasets (RDD) on a strategy and select the exemplars of each RDD. Then exemplars are merged and are used to next AP clustering algorithm, which forms a set of high-quality exemplars after convergence. Experiments show that Spark-AP performs better both in processing scale and processing time.

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2016 2nd Workshop on Advanced Research and Technology in Industry Applications
Series
Advances in Engineering Research
Publication Date
May 2016
ISBN
978-94-6252-195-7
ISSN
2352-5401
DOI
10.2991/wartia-16.2016.107How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Lijia Zhang
AU  - Lianglun Cheng
PY  - 2016/05
DA  - 2016/05
TI  - Affinity Propagation Clustering Algorithm based on Spark Platform
BT  - Proceedings of the 2016 2nd Workshop on Advanced Research and Technology in Industry Applications
PB  - Atlantis Press
SP  - 530
EP  - 533
SN  - 2352-5401
UR  - https://doi.org/10.2991/wartia-16.2016.107
DO  - 10.2991/wartia-16.2016.107
ID  - Zhang2016/05
ER  -