Short Text Topic Discovery Based on BTM Topic Model
- DOI
- 10.2991/msmi-19.2019.19How to use a DOI?
- Keywords
- BTM, JS distance, Singel-Pass clustering, Short text topic discovery.
- Abstract
With the further development of the online social platform, the research techniques of hot topic related to short text data which are represented by Weibo, instant messaging, news commentary and so on, are not extensive enough, and the research efforts are not deep enough either. Moreover, short text data set has many characteristics such as high noise, sparsity, and irregular specification, which makes the performance of traditional topic research techniques insufficient. Therefore, for the data characteristics of short text, this paper uses a short text topic discovery method based on BTM (Bi-term Topic Model) theme model. Firstly, the BTM of the processed short book is modeled to meet the probability distribution of the subject obtained after the data language features of the essay are modeled. Then JS distance is used as the text similarity measure, combined with the improved Single-pass clustering algorithm to find out the hot topic of short text data set. The comparison experiments show that the short text modeling and improved single-pass algorithm use BTM making the clustering efficiency improved, and it can effectively solve the problem of data sparsity in short texts. There has been a remarkable improvement in the quality of the topic discovery.
- Copyright
- © 2019, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Wei-Dong Zhu AU - Wen-Gan Zhou PY - 2019/06 DA - 2019/06 TI - Short Text Topic Discovery Based on BTM Topic Model BT - Proceedings of the 6th International Conference on Management Science and Management Innovation (MSMI 2019) PB - Atlantis Press SP - 100 EP - 106 SN - 2352-5428 UR - https://doi.org/10.2991/msmi-19.2019.19 DO - 10.2991/msmi-19.2019.19 ID - Zhu2019/06 ER -