Autonomous Language Processing and Text Mining by Data Analytics for Business Solutions

Voon Hee Wong; Wei Lun Tan; Jia Li Kor; Xiao Ven Wan

doi:10.2991/978-94-6463-014-5_9

<Previous Article In Volume

Next Article In Volume>

Autonomous Language Processing and Text Mining by Data Analytics for Business Solutions

Authors

Voon Hee Wong¹, Wei Lun Tan¹^{, *}, Jia Li Kor¹, Xiao Ven Wan¹

¹Department of Mathematical and Actuarial Sciences, Lee Kong Chian Faculty of Engineering and Science, Universiti Tunku Abdul Rahman, 43000, Kajang, Malaysia

^*Corresponding author. Email: tanwl@utar.edu.my

Corresponding Author

Wei Lun Tan

Available Online 12 December 2022.

DOI: 10.2991/978-94-6463-014-5_9 How to use a DOI?
Keywords: Speech analytics solution; Word recognition rate; Support vector machine; Naïve Bayes algorithms
Abstract: Speech analytics solution is a technology that enables a company to discover customer’s patterns and insights by analyzing relevant data, such as recorded audio files or phone conversations. The accuracy of speech recognition or speech-to-text transcription has been a challenge all along. This paper aims to present a text classification model for the call transcriptions based on the context, and to improve the accuracy of Google Speech API in Malay language. In this study, the accuracy of speech-to-text transcription is measured by word recognition rate and an accuracy scale. Time-cut-point and audio speed are the factors investigated to determine whether these factors affect the accuracy of text transcription. The results obtained from different time-cut-point and audio speed setting have been studied to identify the best combination. Furthermore, the pre-processed text data is utilized to train the text classification model using Support Vector Machine and Naive Bayes algorithms. In this paper, two approaches have been studied to improve Google Speech API. The first approach is to apply speech adaptation, which is the function made by Google. However, it showed that the accuracy dropped when 250 words were added into the speech adaptation, or when the audio speed was lowered. This is because the words error rate for both methods have increased. In the second approach, removing speech adaptation and lowering audio speed simultaneously caused a decrease in words error rate, hence the accuracy increased. In a nutshell, Support Vector Machine has better accuracy score of text classification as compared with Naive Bayes algorithms. As a result, short time-cut-point with normal speed of audio file showed a positive impact to improve Google speech-to-text API, along with Support Vector Machine being more suitable for classification model.
Copyright: © 2023 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Mathematical Sciences and Statistics 2022 (ICMSS 2022)
Series: Advances in Computer Science Research
Publication Date: 12 December 2022
ISBN: 978-94-6463-014-5
ISSN: 2352-538X
DOI: 10.2991/978-94-6463-014-5_9 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Voon Hee Wong
AU  - Wei Lun Tan
AU  - Jia Li Kor
AU  - Xiao Ven Wan
PY  - 2022
DA  - 2022/12/12
TI  - Autonomous Language Processing and Text Mining by Data Analytics for Business Solutions
BT  - Proceedings of the International Conference on Mathematical Sciences and Statistics 2022 (ICMSS 2022)
PB  - Atlantis Press
SP  - 85
EP  - 93
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-014-5_9
DO  - 10.2991/978-94-6463-014-5_9
ID  - Wong2022
ER  -

download .riscopy to clipboard