A study on speech recognition of Tibetan Amdo based on whisper

Like Ma; Guanyu Li; Runyu Zhe

doi:10.2991/978-94-6463-490-7_40

<Previous Article In Volume

Next Article In Volume>

A study on speech recognition of Tibetan Amdo based on whisper

Authors

Like Ma¹, Guanyu Li¹^{, *}, Runyu Zhe¹

¹Key Laboratory of Linguistic and Cultural Computing Ministry of Education (Northwest Minzu University), Lanzhou, China

^*Corresponding author. Email: xxlgy@xbmu.edu.cn

Corresponding Author

Guanyu Li

Available Online 31 August 2024.

DOI: 10.2991/978-94-6463-490-7_40 How to use a DOI?
Keywords: speech recognition; whisper; fine-tuning; Amdo Tibetan
Abstract: In languages like Amdo Tibetan, which have a small speaker population and pose challenges in data collection, achieving high accuracy in speech recognition remains a considerable challenge. Whisper, a general-purpose speech recognition model developed by OpenAI, achieves near-human levels of accuracy and robustness by utilizing vast datasets for training. When the available Amdo corpus was utilized in this study, it was observed that after a brief period of fine-tuning, the Whisper model's recognition capabilities improved markedly. Initially unable to recognize Tibetan, the character error rate (CER) was reduced to 23.84% in the Whisper-base version post fine-tuning. Further improvements were noted in the Whisper-medium version, where the CER dropped to 9.31%. These findings highlight the Whisper model's substantial potential for recognizing low-resource languages and demonstrate the model’s adaptability through fine-tuning for specific tasks. The study confirms that, despite limited data resources, targeted fine-tuning enables the Whisper model to achieve impressive recognition results in languages such as Amdo Tibetan.
Copyright: © 2024 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2024 3rd International Conference on Artificial Intelligence, Internet and Digital Economy (ICAID 2024)
Series: Atlantis Highlights in Intelligent Systems
Publication Date: 31 August 2024
ISBN: 978-94-6463-490-7
ISSN: 2589-4919
DOI: 10.2991/978-94-6463-490-7_40 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Like Ma
AU  - Guanyu Li
AU  - Runyu Zhe
PY  - 2024
DA  - 2024/08/31
TI  - A study on speech recognition of Tibetan Amdo based on whisper
BT  - Proceedings of the 2024 3rd International Conference on Artificial Intelligence, Internet and Digital Economy (ICAID 2024)
PB  - Atlantis Press
SP  - 367
EP  - 373
SN  - 2589-4919
UR  - https://doi.org/10.2991/978-94-6463-490-7_40
DO  - 10.2991/978-94-6463-490-7_40
ID  - Ma2024
ER  -

download .riscopy to clipboard