Research on Parallelized Stream Data Micro Clustering Algorithm
- DOI
- 10.2991/ameii-15.2015.116How to use a DOI?
- Keywords
- clustering; stream data; distriubuted algorithm; MapReduce; Micro-clustering
- Abstract
Analysis and mining of stream data is a hot research topic in recent years. In order to improve the clustering efficiency, based on MapReduce, this paper proposes a Parallelized Stream Data Micro Clustering Algorithm PSDMC for the micro-clustering phase of CluStream algorithm. PSDMC algorithm uses a series of containers to store real-time stream data according to the arrival time. Each map node produces real-time local micro-clusters per unit time (such as 1 second). The reduce node puts together these real-time local micro-clusters to produce real-time global micro-clusters by using DBSCAN and the micro clustering method of CluStream. The global micro-clusters will be used to renew local micro-clusters in every map node and be used to create snapshots to store into Pyramidal Time Frame. Analysis shows that the efficiency of PSDMC algorithm can increase nearly linearly with the increase of map nodes while the clustering accuracy can be guaranteed.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Ke Ma AU - Lingjuan Li AU - Yimu Ji AU - Shengmei Luo AU - Tao Wen PY - 2015/04 DA - 2015/04 TI - Research on Parallelized Stream Data Micro Clustering Algorithm BT - Proceedings of the International Conference on Advances in Mechanical Engineering and Industrial Informatics PB - Atlantis Press SP - 629 EP - 634 SN - 2352-5401 UR - https://doi.org/10.2991/ameii-15.2015.116 DO - 10.2991/ameii-15.2015.116 ID - Ma2015/04 ER -