A study on speech recognition of Tibetan Amdo based on whisper
- DOI
- 10.2991/978-94-6463-490-7_40How to use a DOI?
- Keywords
- speech recognition; whisper; fine-tuning; Amdo Tibetan
- Abstract
In languages like Amdo Tibetan, which have a small speaker population and pose challenges in data collection, achieving high accuracy in speech recognition remains a considerable challenge. Whisper, a general-purpose speech recognition model developed by OpenAI, achieves near-human levels of accuracy and robustness by utilizing vast datasets for training. When the available Amdo corpus was utilized in this study, it was observed that after a brief period of fine-tuning, the Whisper model's recognition capabilities improved markedly. Initially unable to recognize Tibetan, the character error rate (CER) was reduced to 23.84% in the Whisper-base version post fine-tuning. Further improvements were noted in the Whisper-medium version, where the CER dropped to 9.31%. These findings highlight the Whisper model's substantial potential for recognizing low-resource languages and demonstrate the model’s adaptability through fine-tuning for specific tasks. The study confirms that, despite limited data resources, targeted fine-tuning enables the Whisper model to achieve impressive recognition results in languages such as Amdo Tibetan.
- Copyright
- © 2024 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Like Ma AU - Guanyu Li AU - Runyu Zhe PY - 2024 DA - 2024/08/31 TI - A study on speech recognition of Tibetan Amdo based on whisper BT - Proceedings of the 2024 3rd International Conference on Artificial Intelligence, Internet and Digital Economy (ICAID 2024) PB - Atlantis Press SP - 367 EP - 373 SN - 2589-4919 UR - https://doi.org/10.2991/978-94-6463-490-7_40 DO - 10.2991/978-94-6463-490-7_40 ID - Ma2024 ER -