Improving Suffix Tree Clustering Algorithm for Web Documents
- DOI
- 10.2991/lemcs-15.2015.310How to use a DOI?
- Keywords
- Web document clustering; Suffix tree; Suffix tree clustering; Space vector model; Pearson correlation coefficient
- Abstract
Web document clustering results can help users quickly locate the information they need among the results search engines returned. According to the characteristics of the suffix tree structure and the flaws of similarity calculation in STC algorithm's cluster merging, this paper proposes an improved suffix tree clustering method. The method combines vector space model with Pearson correlation coefficient, calculates the relevant of clusters based on document vector of all clusters, and then utilizes the relevant vectors of clusters and the correlations between them to calculate the similarity for cluster merging, improves the clustering process of documents. Analysis of the experimental results shows that the method outperforms the original STC algorithm on Web documents clustering.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Yan Zhuang AU - Youguang Chen PY - 2015/07 DA - 2015/07 TI - Improving Suffix Tree Clustering Algorithm for Web Documents BT - Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science PB - Atlantis Press SP - 1557 EP - 1561 SN - 1951-6851 UR - https://doi.org/10.2991/lemcs-15.2015.310 DO - 10.2991/lemcs-15.2015.310 ID - Zhuang2015/07 ER -