A Parallel Clustering Method Study Based on MapReduce
- DOI
- 10.2991/ccis-13.2013.96How to use a DOI?
- Keywords
- Clustering; Information bottleneck theory; MapReduce; Multidimensional Scaling; Twister
- Abstract
Clustering is considered as one of the most important tasks in data mining. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. It has been widely applied to many kinds of areas. Many clustering methods have been studied, such as k-means, Fisher clustering method, Kohonen neural network and so on. In many kinds of areas, the scale of data set becomes larger and larger. Classical clustering methods are out of reach in practice in face of big data. The study of clustering methods based on large scale data is considered as an important task. MapReduce is taken as the most efficient model to deal with data intensive problems. In this paper, parallel clustering method based on MapReduce is studied. The research mainly contributes the following aspects. Firstly, it determines the initial center objectively. Secondly, information loss is taken as the distance metric between two samples. The efficiency of the method is illustrated with a practical DNA clustering problem.
- Copyright
- © 2013, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Zhanquan Sun PY - 2013/11 DA - 2013/11 TI - A Parallel Clustering Method Study Based on MapReduce BT - Proceedings of the The 1st International Workshop on Cloud Computing and Information Security PB - Atlantis Press SP - 416 EP - 419 SN - 1951-6851 UR - https://doi.org/10.2991/ccis-13.2013.96 DO - 10.2991/ccis-13.2013.96 ID - Sun2013/11 ER -