Proceedings of the International Conference on Mathematical Sciences and Statistics 2022 (ICMSS 2022)

Autonomous Language Processing and Text Mining by Data Analytics for Business Solutions

Authors
Voon Hee Wong1, Wei Lun Tan1, *, Jia Li Kor1, Xiao Ven Wan1
1Department of Mathematical and Actuarial Sciences, Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman, 43000, Kajang, Malaysia
*Corresponding author. Email: tanwl@utar.edu.my
Corresponding Author
Wei Lun Tan
Available Online 12 December 2022.
DOI
10.2991/978-94-6463-014-5_9How to use a DOI?
Keywords
Speech analytics solution; Word recognition rate; Support vector machine; Naïve Bayes algorithms
Abstract

Speech analytics solution is a technology that enables a company to discover customer’s patterns and insights by analyzing relevant data, such as recorded audio files or phone conversations. The accuracy of speech recognition or speech-to-text transcription has been a challenge all along. This paper aims to present a text classification model for the call transcriptions based on the context, and to improve the accuracy of Google Speech API in Malay language. In this study, the accuracy of speech-to-text transcription is measured by word recognition rate and an accuracy scale. Time-cut-point and audio speed are the factors investigated to determine whether these factors affect the accuracy of text transcription. The results obtained from different time-cut-point and audio speed setting have been studied to identify the best combination. Furthermore, the pre-processed text data is utilized to train the text classification model using Support Vector Machine and Naive Bayes algorithms. In this paper, two approaches have been studied to improve Google Speech API. The first approach is to apply speech adaptation, which is the function made by Google. However, it showed that the accuracy dropped when 250 words were added into the speech adaptation, or when the audio speed was lowered. This is because the words error rate for both methods have increased. In the second approach, removing speech adaptation and lowering audio speed simultaneously caused a decrease in words error rate, hence the accuracy increased. In a nutshell, Support Vector Machine has better accuracy score of text classification as compared with Naive Bayes algorithms. As a result, short time-cut-point with normal speed of audio file showed a positive impact to improve Google speech-to-text API, along with Support Vector Machine being more suitable for classification model.

Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Mathematical Sciences and Statistics 2022 (ICMSS 2022)
Series
Advances in Computer Science Research
Publication Date
12 December 2022
ISBN
978-94-6463-014-5
ISSN
2352-538X
DOI
10.2991/978-94-6463-014-5_9How to use a DOI?
Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Voon Hee Wong
AU  - Wei Lun Tan
AU  - Jia Li Kor
AU  - Xiao Ven Wan
PY  - 2022
DA  - 2022/12/12
TI  - Autonomous Language Processing and Text Mining by Data Analytics for Business Solutions
BT  - Proceedings of the International Conference on Mathematical Sciences and Statistics 2022 (ICMSS 2022)
PB  - Atlantis Press
SP  - 85
EP  - 93
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-014-5_9
DO  - 10.2991/978-94-6463-014-5_9
ID  - Wong2022
ER  -