Study of Tibetan Text Classification based on fastText
- DOI
- 10.2991/iccia-19.2019.58How to use a DOI?
- Keywords
- text classification, Tibetan text, fastText.
- Abstract
Tibetan text classification is an important research topic in Tibetan information processing. In this paper, we attempt to apply fastText text classification tool and fastText pre-training word vectors for Tibetan text classification. In the experiment, For the Tibetan language corpus segmented by Tibetan syllable points, we represent all the words in each document with the fastText pre-training word vectors, and then average all the word vectors in this data. The average vector (docvec) represent each piece of document, we put it into SVM classifier, and the results show that the model outperforms competitive the traditional Tibetan text classification method, and the F-measure has improved by 10%.
- Copyright
- © 2019, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Wei Ma AU - Hongzhi Yu AU - Jing Ma PY - 2019/07 DA - 2019/07 TI - Study of Tibetan Text Classification based on fastText BT - Proceedings of the 3rd International Conference on Computer Engineering, Information Science & Application Technology (ICCIA 2019) PB - Atlantis Press SP - 374 EP - 380 SN - 2352-538X UR - https://doi.org/10.2991/iccia-19.2019.58 DO - 10.2991/iccia-19.2019.58 ID - Ma2019/07 ER -