Topic Model over Short Texts Incorporating Word Embedding
- DOI
- 10.2991/aeecs-18.2018.34How to use a DOI?
- Keywords
- Short texts, Topic model, Word embedding, Text mining
- Abstract
Short texts' data sparsity makes them difficult to find out their document-level word co-occurrence patterns, that's why conventional topic models like LDA experience a large performance degradation over short texts. As a derivative product of learning neuro probabilistic language model, word embedding can well express semantic similarity of word. In this paper, we propose a new model called promotion-BTM, which promotes the probability that similar words based on word embedding belong to the same topic. It also distinguishes the words of a biterm into topical word and general word, and only promotes topical words' semantically similar words. Extensive experiments on real-world datasets show that our model exceeds the baseline model BTM on all evaluations.
- Copyright
- © 2018, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Kai Yu AU - Yiming Zhang AU - Xu Wang PY - 2018/03 DA - 2018/03 TI - Topic Model over Short Texts Incorporating Word Embedding BT - Proceedings of the 2018 2nd International Conference on Advances in Energy, Environment and Chemical Science (AEECS 2018) PB - Atlantis Press SP - 194 EP - 200 SN - 2352-5401 UR - https://doi.org/10.2991/aeecs-18.2018.34 DO - 10.2991/aeecs-18.2018.34 ID - Yu2018/03 ER -