Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Minglai Shao; Liangxi Qin

doi:10.2991/sekeie-14.2014.47

<Previous Article In Volume

Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence

Authors

Minglai Shao, Liangxi Qin

Corresponding Author

Minglai Shao

Available Online March 2014.

DOI: 10.2991/sekeie-14.2014.47 How to use a DOI?
Keywords: Topic model; LDA (Latent Dirichlet Allocation); JS (Jensen-Shannon) distance; word co-occurrence; similarity
Abstract: LDA (Latent Dirichlet Allocation) topic model has been widely applied to text clustering owing to its efficient dimension reduction. The prevalent method is to model text set through LDA topic model, to make inference by Gibbs sampling, and to calculate text similarity with JS (Jensen- Shannon) distance. However, JS distance cannot distinguish semantic associations among text topics. For this defect, a new text similarity computing algorithm based on hidden topics model and word co-occurrence analysis is introduced. Tests are carried out to verify the clustering effect of this improved computing algorithm. Results show that this method can effectively improve text similarity computing result and text clustering accuracy.
Copyright: © 2014, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Volume Title: Proceedings of the 2nd International Conference on Software Engineering, Knowledge Engineering and Information Engineering (SEKEIE 2014)
Series: Advances in Intelligent Systems Research
Publication Date: March 2014
ISBN: 978-94-62520-25-7
ISSN: 1951-6851
DOI: 10.2991/sekeie-14.2014.47 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Minglai Shao
AU  - Liangxi Qin
PY  - 2014/03
DA  - 2014/03
TI  - Text Similarity Computing Based on LDA Topic Model and Word Co-occurrence
BT  - Proceedings of the 2nd International Conference on Software Engineering, Knowledge Engineering and Information Engineering (SEKEIE 2014)
PB  - Atlantis Press
SP  - 199
EP  - 203
SN  - 1951-6851
UR  - https://doi.org/10.2991/sekeie-14.2014.47
DO  - 10.2991/sekeie-14.2014.47
ID  - Shao2014/03
ER  -

download .riscopy to clipboard