Key Research of Pre-processing on Mongolian-Chinese Neural Machine Translation
- DOI
- 10.2991/aiie-16.2016.1How to use a DOI?
- Keywords
- Mongolian-Chinese translation; neural machine translation; pre-processing; attention-based
- Abstract
Neural machine translation has recently achieved promising results with the big scale corpus. But there is little research on the small scale corpus, such as Mongolian. Mongolian belongs to the agglutinative language while Chinese is a pictograph. It is necessary to do some pre-processing for both Mongolian and Chinese before training the machine translation. In this paper, we successfully build an attention-based neural machine translation to do the CWMT2009 Mongolian to Chinese translation task. We also use four different approaches, respectively, to do the pre-processing for both Mongolian and Chinese, including segmenting Chinese into character, separating the Mongolian stem from the suffixes, addressing the case suffix and converting Mongolian into Latin. We carry out a lot of experiments to evaluate the approaches. We achieve the best BLEU with the score of 29.56. It is 1.82 points in BLEU score higher than the baseline which is trained with the original Mongolian and the general word segmentation of Chinese.
- Copyright
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Jian Du AU - Hongxu Hou AU - Jing Wu AU - Zhipeng Shen AU - Jinting Li AU - Hongbin Wang PY - 2016/11 DA - 2016/11 TI - Key Research of Pre-processing on Mongolian-Chinese Neural Machine Translation BT - Proceedings of the 2016 2nd International Conference on Artificial Intelligence and Industrial Engineering (AIIE 2016) PB - Atlantis Press SP - 1 EP - 6 SN - 1951-6851 UR - https://doi.org/10.2991/aiie-16.2016.1 DO - 10.2991/aiie-16.2016.1 ID - Du2016/11 ER -