Research on N-Gram–Based Mongolian Information Retrieval Unit
- DOI
- 10.2991/icectt-15.2015.84How to use a DOI?
- Keywords
- Mongolian information retrieval; n-gram form; retrieval unit; structured query; language model
- Abstract
In order to improve the efficiency of Mongolian information retrieval, further research is carried out on n-gram-based retrieval unit with selected information retrieval model by combining the characteristics of Mongolian language. Selectable information retrieval model include Vector Space Model and Language Model. Good-Turing Smooth and JM Smooth are employed for smooth. The following four steps are conducted for n-gram (n is from 2 to 5): establishment of corpus, query retrieval, retrieval and evaluation. Thereby, comparison is conducted on recall rate and precision rate to find out the proper retrieval unit. The results show that n-gram n=4 is the proper retrieval unit for Mongolian Information retrieval system.
- Copyright
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Junying Yue AU - Guanglai Gao AU - Min Lin PY - 2015/11 DA - 2015/11 TI - Research on N-Gram–Based Mongolian Information Retrieval Unit BT - Proceedings of the 2015 International Conference on Electromechanical Control Technology and Transportation PB - Atlantis Press SP - 439 EP - 445 SN - 2352-5401 UR - https://doi.org/10.2991/icectt-15.2015.84 DO - 10.2991/icectt-15.2015.84 ID - Yue2015/11 ER -