Proceedings of the 2013 the International Conference on Education Technology and Information System (ICETIS 2013)

A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights

Authors
Ping Zhou, Jiayin Wei, Yongbin Qin
Corresponding Author
Ping Zhou
Available Online June 2013.
DOI
10.2991/icetis-13.2013.235How to use a DOI?
Keywords
Text Clustering; Semi-supervised Clustering; LDA; Word Distribution
Abstract

Semi-supervised text clustering, as a research branch of the text clustering, aims at employing limited priori knowledge to aid unsupervised text clustering process, and helping users get improved clustering results. Because labeled data are difficult, expensive and time-consuming to obtain, it is important to use the supervised information effectively to improve the performance of clustering significantly. This paper proposes a semi-supervised LDA text clustering algorithm based on the weights of word distribution (WWDLDA). By introducing the coefficients of word distribution obtained from labeled data, LDA model can be used in the field of semi-supervised clustering. In the process of clustering, coefficients always adjust the word distribution to change the clustering results. Our experimental results on real data sets show that the proposed semi-supervised text clustering algorithm can get better clustering results than constrained mixmnl, where mixmnl stands for multinomial model-based EM algorithm.

Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2013 the International Conference on Education Technology and Information System (ICETIS 2013)
Series
Advances in Intelligent Systems Research
Publication Date
June 2013
ISBN
978-90-78677-76-5
ISSN
1951-6851
DOI
10.2991/icetis-13.2013.235How to use a DOI?
Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Ping Zhou
AU  - Jiayin Wei
AU  - Yongbin Qin
PY  - 2013/06
DA  - 2013/06
TI  - A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights
BT  - Proceedings of the 2013 the International Conference on Education Technology and Information System (ICETIS 2013)
PB  - Atlantis Press
SP  - 1029
EP  - 1033
SN  - 1951-6851
UR  - https://doi.org/10.2991/icetis-13.2013.235
DO  - 10.2991/icetis-13.2013.235
ID  - Zhou2013/06
ER  -