A Parallel Clustering Method Study Based on MapReduce

Zhanquan Sun

doi:10.2991/ccis-13.2013.96

<Previous Article In Volume

Next Article In Volume>

A Parallel Clustering Method Study Based on MapReduce

Authors

Zhanquan Sun

Corresponding Author

Zhanquan Sun

Available Online November 2013.

DOI: 10.2991/ccis-13.2013.96 How to use a DOI?
Keywords: Clustering; Information bottleneck theory; MapReduce; Multidimensional Scaling; Twister
Abstract: Clustering is considered as one of the most important tasks in data mining. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. It has been widely applied to many kinds of areas. Many clustering methods have been studied, such as k-means, Fisher clustering method, Kohonen neural network and so on. In many kinds of areas, the scale of data set becomes larger and larger. Classical clustering methods are out of reach in practice in face of big data. The study of clustering methods based on large scale data is considered as an important task. MapReduce is taken as the most efficient model to deal with data intensive problems. In this paper, parallel clustering method based on MapReduce is studied. The research mainly contributes the following aspects. Firstly, it determines the initial center objectively. Secondly, information loss is taken as the distance metric between two samples. The efficiency of the method is illustrated with a practical DNA clustering problem.
Copyright: © 2013, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the The 1st International Workshop on Cloud Computing and Information Security
Series: Advances in Intelligent Systems Research
Publication Date: November 2013
ISBN: 978-90-78677-88-8
ISSN: 1951-6851
DOI: 10.2991/ccis-13.2013.96 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Zhanquan Sun
PY  - 2013/11
DA  - 2013/11
TI  - A Parallel Clustering Method Study Based on MapReduce
BT  - Proceedings of the The 1st International Workshop on Cloud Computing and Information Security
PB  - Atlantis Press
SP  - 416
EP  - 419
SN  - 1951-6851
UR  - https://doi.org/10.2991/ccis-13.2013.96
DO  - 10.2991/ccis-13.2013.96
ID  - Sun2013/11
ER  -

download .riscopy to clipboard