Autonomous Language Processing and Text Mining by Data Analytics for Business Solutions
- DOI
- 10.2991/978-94-6463-014-5_9How to use a DOI?
- Keywords
- Speech analytics solution; Word recognition rate; Support vector machine; Naïve Bayes algorithms
- Abstract
Speech analytics solution is a technology that enables a company to discover customer’s patterns and insights by analyzing relevant data, such as recorded audio files or phone conversations. The accuracy of speech recognition or speech-to-text transcription has been a challenge all along. This paper aims to present a text classification model for the call transcriptions based on the context, and to improve the accuracy of Google Speech API in Malay language. In this study, the accuracy of speech-to-text transcription is measured by word recognition rate and an accuracy scale. Time-cut-point and audio speed are the factors investigated to determine whether these factors affect the accuracy of text transcription. The results obtained from different time-cut-point and audio speed setting have been studied to identify the best combination. Furthermore, the pre-processed text data is utilized to train the text classification model using Support Vector Machine and Naive Bayes algorithms. In this paper, two approaches have been studied to improve Google Speech API. The first approach is to apply speech adaptation, which is the function made by Google. However, it showed that the accuracy dropped when 250 words were added into the speech adaptation, or when the audio speed was lowered. This is because the words error rate for both methods have increased. In the second approach, removing speech adaptation and lowering audio speed simultaneously caused a decrease in words error rate, hence the accuracy increased. In a nutshell, Support Vector Machine has better accuracy score of text classification as compared with Naive Bayes algorithms. As a result, short time-cut-point with normal speed of audio file showed a positive impact to improve Google speech-to-text API, along with Support Vector Machine being more suitable for classification model.
- Copyright
- © 2023 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Voon Hee Wong AU - Wei Lun Tan AU - Jia Li Kor AU - Xiao Ven Wan PY - 2022 DA - 2022/12/12 TI - Autonomous Language Processing and Text Mining by Data Analytics for Business Solutions BT - Proceedings of the International Conference on Mathematical Sciences and Statistics 2022 (ICMSS 2022) PB - Atlantis Press SP - 85 EP - 93 SN - 2352-538X UR - https://doi.org/10.2991/978-94-6463-014-5_9 DO - 10.2991/978-94-6463-014-5_9 ID - Wong2022 ER -