A Multi-Feature Approach with Data Augmentation for Speech Emotion Recognition using Deep Learning

M. Asha Priyadarshini; B. Lakshmi Satwika Bai; N. V. Nagendra Reddy; K. Nagendra Babu; K. Pratap

doi:10.2991/978-94-6463-471-6_80

<Previous Article In Volume

Next Article In Volume>

A Multi-Feature Approach with Data Augmentation for Speech Emotion Recognition using Deep Learning

Authors

M. Asha Priyadarshini¹, B. Lakshmi Satwika Bai², N. V. Nagendra Reddy²^{, *}, K. Nagendra Babu², K. Pratap²

¹Associate Professor, Department of CSE, Vignan’s Lara Institute of Technology & Science, Vadlamudi, Guntur, Andhra Pradesh, India

²UG Final Year, Vignan’s Lara Institute of Technology & Science, Vadlamudi, Guntur, Andhra Pradesh, India

^*Corresponding author. Email: vsnreddy65@gmail.com

Corresponding Author

N. V. Nagendra Reddy

Available Online 30 July 2024.

DOI: 10.2991/978-94-6463-471-6_80 How to use a DOI?
Keywords: Convolutional Neural Networks; CNN; Deep Learning; MFCC; Mel-Frequency Cepstral Coefficients; Zero- Crossing Rate; ZCR; Root Mean Square Energy; RMS; RAVDEESS; Crema-D; TESS; SAVEE
Abstract: This research project explores building a speech emotion recognition system using Convolutional Neural Networks (CNNs). We leverage multiple datasets like RAVDEESS, Crema-D, TESS, and SAVEE, which contain audio recordings labeled with emotions (happy, sad, angry, etc.). After meticulously converting these labels to human- readable descriptions, we explore the data's emotional distribution. To prepare the data for the CNN, we extract Mel-Frequency Cepstral Coefficients (MFCCs) that capture how humans perceive speech, along with Zero-Crossing Rate (ZCR) and Root Mean Square Energy (RMS) for additional information. While this work focuses on data preparation and feature extraction, future efforts will involve building and training a CNN model to predict emotions based on these features. The trained model's performance will be evaluated using metrics like accuracy, paving the way for deployment in real-world applications where understanding emotions in speech is valuable.
Copyright: © 2024 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the International Conference on Computational Innovations and Emerging Trends (ICCIET- 2024)
Series: Advances in Computer Science Research
Publication Date: 30 July 2024
ISBN: 978-94-6463-471-6
ISSN: 2352-538X
DOI: 10.2991/978-94-6463-471-6_80 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - M. Asha Priyadarshini
AU  - B. Lakshmi Satwika Bai
AU  - N. V. Nagendra Reddy
AU  - K. Nagendra Babu
AU  - K. Pratap
PY  - 2024
DA  - 2024/07/30
TI  - A Multi-Feature Approach with Data Augmentation for Speech Emotion Recognition using Deep Learning
BT  - Proceedings of the International Conference on Computational Innovations and Emerging Trends (ICCIET- 2024)
PB  - Atlantis Press
SP  - 835
EP  - 856
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-471-6_80
DO  - 10.2991/978-94-6463-471-6_80
ID  - Priyadarshini2024
ER  -

download .riscopy to clipboard