Improving Suffix Tree Clustering Algorithm for Web Documents

Yan Zhuang; Youguang Chen

doi:10.2991/lemcs-15.2015.310

<Previous Article In Volume

Next Article In Volume>

Improving Suffix Tree Clustering Algorithm for Web Documents

Authors

Yan Zhuang, Youguang Chen

Corresponding Author

Yan Zhuang

Available Online July 2015.

DOI: 10.2991/lemcs-15.2015.310 How to use a DOI?
Keywords: Web document clustering; Suffix tree; Suffix tree clustering; Space vector model; Pearson correlation coefficient
Abstract: Web document clustering results can help users quickly locate the information they need among the results search engines returned. According to the characteristics of the suffix tree structure and the flaws of similarity calculation in STC algorithm's cluster merging, this paper proposes an improved suffix tree clustering method. The method combines vector space model with Pearson correlation coefficient, calculates the relevant of clusters based on document vector of all clusters, and then utilizes the relevant vectors of clusters and the correlations between them to calculate the similarity for cluster merging, improves the clustering process of documents. Analysis of the experimental results shows that the method outperforms the original STC algorithm on Web documents clustering.
Copyright: © 2015, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
Series: Advances in Intelligent Systems Research
Publication Date: July 2015
ISBN: 978-94-6252-102-5
ISSN: 1951-6851
DOI: 10.2991/lemcs-15.2015.310 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Yan Zhuang
AU  - Youguang Chen
PY  - 2015/07
DA  - 2015/07
TI  - Improving Suffix Tree Clustering Algorithm for Web Documents
BT  - Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
PB  - Atlantis Press
SP  - 1557
EP  - 1561
SN  - 1951-6851
UR  - https://doi.org/10.2991/lemcs-15.2015.310
DO  - 10.2991/lemcs-15.2015.310
ID  - Zhuang2015/07
ER  -

download .riscopy to clipboard