A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights
- DOI
- 10.2991/icetis-13.2013.235How to use a DOI?
- Keywords
- Text Clustering; Semi-supervised Clustering; LDA; Word Distribution
- Abstract
Semi-supervised text clustering, as a research branch of the text clustering, aims at employing limited priori knowledge to aid unsupervised text clustering process, and helping users get improved clustering results. Because labeled data are difficult, expensive and time-consuming to obtain, it is important to use the supervised information effectively to improve the performance of clustering significantly. This paper proposes a semi-supervised LDA text clustering algorithm based on the weights of word distribution (WWDLDA). By introducing the coefficients of word distribution obtained from labeled data, LDA model can be used in the field of semi-supervised clustering. In the process of clustering, coefficients always adjust the word distribution to change the clustering results. Our experimental results on real data sets show that the proposed semi-supervised text clustering algorithm can get better clustering results than constrained mixmnl, where mixmnl stands for multinomial model-based EM algorithm.
- Copyright
- © 2013, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Ping Zhou AU - Jiayin Wei AU - Yongbin Qin PY - 2013/06 DA - 2013/06 TI - A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights BT - Proceedings of the 2013 the International Conference on Education Technology and Information System (ICETIS 2013) PB - Atlantis Press SP - 1029 EP - 1033 SN - 1951-6851 UR - https://doi.org/10.2991/icetis-13.2013.235 DO - 10.2991/icetis-13.2013.235 ID - Zhou2013/06 ER -