Statistics and Analysis of Mongolian Syllables Based on Network Corpus
- DOI
- 10.2991/caai-18.2018.37How to use a DOI?
- Keywords
- mongolian syllable; n-gram model; spell check; statistics and analysis; network corpus
- Abstract
This article achieved the large-scale Mongolian text corpus from CCTV and some other news websites, and conducted statistics and analysis on the Mongolian syllables in this text. From the statistics and analysis, we can see that the possibility of the co-occurrence of the different Mongolian syllable by the n-gram model. At the same time, these data also show that the main reasons leading to the misspelling of Mongolian include the following aspects: one is the monosyllabic error, the second is the misuse of the space, the third is the improper use of the control character, and the fourth is the polyphonic word of the same shape.
- Copyright
- © 2018, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Zhuyuan Cai AU - Monghjaya PY - 2018/08 DA - 2018/08 TI - Statistics and Analysis of Mongolian Syllables Based on Network Corpus BT - Proceedings of the 2018 3rd International Conference on Control, Automation and Artificial Intelligence (CAAI 2018) PB - Atlantis Press SP - 159 EP - 161 SN - 2589-4919 UR - https://doi.org/10.2991/caai-18.2018.37 DO - 10.2991/caai-18.2018.37 ID - Cai2018/08 ER -