Similarity calculation based on Mongolian news corpus
- DOI
- 10.2991/amcce-18.2018.31How to use a DOI?
- Keywords
- Similarity, Mongolian, Vector Space Model.
- Abstract
Similarity calculation is an important part of new event detection and effective computation of text similarity can remove redundant information and improve the efficiency of users' query. The paper mainly studies the calculation of the similarity between the Mongolian news materials. Because of the non-standard Mongolian news corpus, the corpus needs to be preprocessed in order to deal with the later work, which can improve the efficiency. So first of all, it is necessary to preprocess the news corpus, including code conversion、text proofreading、stop-words removal and suffixes removal. Then the news messages are mapped to vectors with a vector space model and calculating similarity between the vectors by Cosine formula. Finally, we choose precision、recall、F-measure as evaluation standard to evaluate the experimental results. The results show that the experiment is better than the manual.
- Copyright
- © 2018, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Yaowen Gao AU - Feilong Bao AU - Guanglai Gao PY - 2018/05 DA - 2018/05 TI - Similarity calculation based on Mongolian news corpus BT - Proceedings of the 2018 3rd International Conference on Automation, Mechanical Control and Computational Engineering (AMCCE 2018) PB - Atlantis Press SP - 176 EP - 181 SN - 2352-5401 UR - https://doi.org/10.2991/amcce-18.2018.31 DO - 10.2991/amcce-18.2018.31 ID - Gao2018/05 ER -