The Research of Chinese Short-text Classification Based on Domain Keyword Set Extension and HowNet
- DOI
- 10.2991/icca-16.2016.57How to use a DOI?
- Keywords
- Short-text classification, Keyword set, LDA, Feature extension, HowNet
- Abstract
To implement feature extension of short text and improve short text classification performance, this paper extracts the high frequency words and topic core words of each class of the training set as domain keyword set based on two different feature granularity, which are keyword and latent topic, and derives the topic probability distribution of the test text using LDA model, while some topic probability is greater than a certain threshold, extends the keywords of the topic into the testing text. Calculate the semantic similarity of the test text and the domain keyword set for each category by using HowNet. Experimental results show that the method proposed in this paper can effectively improve the short-text classification performance.
- Copyright
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Xiangdong Li AU - Fan Gao AU - Cong Ding PY - 2016/01 DA - 2016/01 TI - The Research of Chinese Short-text Classification Based on Domain Keyword Set Extension and HowNet BT - Proceedings of the 2016 International Conference on Intelligent Control and Computer Application PB - Atlantis Press SP - 244 EP - 247 SN - 2352-538X UR - https://doi.org/10.2991/icca-16.2016.57 DO - 10.2991/icca-16.2016.57 ID - Li2016/01 ER -