Proceedings of the 2024 3rd International Conference on Artificial Intelligence, Internet and Digital Economy (ICAID 2024)

A study on speech recognition of Tibetan Amdo based on whisper

Authors
Like Ma1, Guanyu Li1, *, Runyu Zhe1
1Key Laboratory of Linguistic and Cultural Computing Ministry of Education (Northwest Minzu University), Lanzhou, China
*Corresponding author. Email: xxlgy@xbmu.edu.cn
Corresponding Author
Guanyu Li
Available Online 31 August 2024.
DOI
10.2991/978-94-6463-490-7_40How to use a DOI?
Keywords
speech recognition; whisper; fine-tuning; Amdo Tibetan
Abstract

In languages like Amdo Tibetan, which have a small speaker population and pose challenges in data collection, achieving high accuracy in speech recognition remains a considerable challenge. Whisper, a general-purpose speech recognition model developed by OpenAI, achieves near-human levels of accuracy and robustness by utilizing vast datasets for training. When the available Amdo corpus was utilized in this study, it was observed that after a brief period of fine-tuning, the Whisper model's recognition capabilities improved markedly. Initially unable to recognize Tibetan, the character error rate (CER) was reduced to 23.84% in the Whisper-base version post fine-tuning. Further improvements were noted in the Whisper-medium version, where the CER dropped to 9.31%. These findings highlight the Whisper model's substantial potential for recognizing low-resource languages and demonstrate the model’s adaptability through fine-tuning for specific tasks. The study confirms that, despite limited data resources, targeted fine-tuning enables the Whisper model to achieve impressive recognition results in languages such as Amdo Tibetan.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the 2024 3rd International Conference on Artificial Intelligence, Internet and Digital Economy (ICAID 2024)
Series
Atlantis Highlights in Intelligent Systems
Publication Date
31 August 2024
ISBN
978-94-6463-490-7
ISSN
2589-4919
DOI
10.2991/978-94-6463-490-7_40How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Like Ma
AU  - Guanyu Li
AU  - Runyu Zhe
PY  - 2024
DA  - 2024/08/31
TI  - A study on speech recognition of Tibetan Amdo based on whisper
BT  - Proceedings of the 2024 3rd International Conference on Artificial Intelligence, Internet and Digital Economy (ICAID 2024)
PB  - Atlantis Press
SP  - 367
EP  - 373
SN  - 2589-4919
UR  - https://doi.org/10.2991/978-94-6463-490-7_40
DO  - 10.2991/978-94-6463-490-7_40
ID  - Ma2024
ER  -