A Telephone Speech Corpus of China’s Minority languages for Automatic Language Identification
- DOI
- 10.2991/wiet-13.2013.47How to use a DOI?
- Keywords
- Language identification; Telephone speech; Corpus; Minority languages
- Abstract
Research in language identification require corpus of multi-languages speech data to capture the distinguishable information within and across languages. In the past few decades, many statistical approaches to language identification have been developed based on two common and public-domain corpora which consist of telephone speech from about 26 languages and dialects. However, the China's minority languages have not been used as the target languages in the published papers up to now. In our work, we select 9 typical China’s minority languages and Mandarin to construct our telephone speech corpus. These minority languages are composed of Naxi, Miao, Bai, Dai, Yi, Zhuang, Uygur language, Mongolian and Tibetan. Each minority language represents its minority nationality. The corpus can be used to study, develop, evaluate and compare minority languages identification algorithms. Moreover, it will promote the Linguistic researchers to pay more attention to the long history and splendid culture of our national minorities.
- Copyright
- © 2013, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Xiuhua Zeng AU - Jian Yang AU - Libo Zuo AU - Yonghua Xu PY - 2013/12 DA - 2013/12 TI - A Telephone Speech Corpus of China’s Minority languages for Automatic Language Identification BT - Proceedings of the AASRI Winter International Conference on Engineering and Technology (AASRI-WIET 2013) PB - Atlantis Press SP - 198 EP - 201 SN - 1951-6851 UR - https://doi.org/10.2991/wiet-13.2013.47 DO - 10.2991/wiet-13.2013.47 ID - Zeng2013/12 ER -