Cloud Computing K-Means Text Clustering Filtering Algorithm based on Hadoop
- DOI
- 10.2991/icmmita-16.2016.278How to use a DOI?
- Keywords
- clustering; K average; text; cloud computing; big data; filtering
- Abstract
the partition and hierarchy methods are the most popular clustering technology of the clustering algorithm. Providing that the k-means is sensitive to the initial clustering center and is likely to become partially optimal, an advanced clustering algorithm based on the partial swarm is presented in this essay through determining the number of clusters and the initial clustering center dynamically with the method shown in Literature [1] combined with the method of Literature [2], so as to optimize the normalization of sample set, weight adjustment of particle swarm, computation of dissimilarity matrix and colony fitness variance. Through this algorithm, the initial clustering center is determined through the density and the max/min distance to eliminate k-means being sensitive to the initial value and partially optimal. The colony fitness variance is introduced through normalization of the dimension properties of sampling set to work out the further optimized hybrid algorithm. According to the test results, this algorithm is featured with higher accuracy and stronger convergence ability.
- Copyright
- © 2017, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Suyu Huang PY - 2017/01 DA - 2017/01 TI - Cloud Computing K-Means Text Clustering Filtering Algorithm based on Hadoop BT - Proceedings of the 2016 4th International Conference on Machinery, Materials and Information Technology Applications PB - Atlantis Press SP - 1209 EP - 1214 SN - 2352-538X UR - https://doi.org/10.2991/icmmita-16.2016.278 DO - 10.2991/icmmita-16.2016.278 ID - Huang2017/01 ER -