Proceedings of the International Conference on Computational Innovations and Emerging Trends (ICCIET- 2024)

A Multi-Feature Approach with Data Augmentation for Speech Emotion Recognition using Deep Learning

Authors
M. Asha Priyadarshini1, B. Lakshmi Satwika Bai2, N. V. Nagendra Reddy2, *, K. Nagendra Babu2, K. Pratap2
1Associate Professor, Department of CSE, Vignan’s Lara Institute of Technology & Science, Vadlamudi, Guntur, Andhra Pradesh, India
2UG Final Year, Vignan’s Lara Institute of Technology & Science, Vadlamudi, Guntur, Andhra Pradesh, India
*Corresponding author. Email: vsnreddy65@gmail.com
Corresponding Author
N. V. Nagendra Reddy
Available Online 30 July 2024.
DOI
10.2991/978-94-6463-471-6_80How to use a DOI?
Keywords
Convolutional Neural Networks; CNN; Deep Learning; MFCC; Mel-Frequency Cepstral Coefficients; Zero- Crossing Rate; ZCR; Root Mean Square Energy; RMS; RAVDEESS; Crema-D; TESS; SAVEE
Abstract

This research project explores building a speech emotion recognition system using Convolutional Neural Networks (CNNs). We leverage multiple datasets like RAVDEESS, Crema-D, TESS, and SAVEE, which contain audio recordings labeled with emotions (happy, sad, angry, etc.). After meticulously converting these labels to human- readable descriptions, we explore the data's emotional distribution. To prepare the data for the CNN, we extract Mel-Frequency Cepstral Coefficients (MFCCs) that capture how humans perceive speech, along with Zero-Crossing Rate (ZCR) and Root Mean Square Energy (RMS) for additional information. While this work focuses on data preparation and feature extraction, future efforts will involve building and training a CNN model to predict emotions based on these features. The trained model's performance will be evaluated using metrics like accuracy, paving the way for deployment in real-world applications where understanding emotions in speech is valuable.

Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Computational Innovations and Emerging Trends (ICCIET- 2024)
Series
Advances in Computer Science Research
Publication Date
30 July 2024
ISBN
10.2991/978-94-6463-471-6_80
ISSN
2352-538X
DOI
10.2991/978-94-6463-471-6_80How to use a DOI?
Copyright
© 2024 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - M. Asha Priyadarshini
AU  - B. Lakshmi Satwika Bai
AU  - N. V. Nagendra Reddy
AU  - K. Nagendra Babu
AU  - K. Pratap
PY  - 2024
DA  - 2024/07/30
TI  - A Multi-Feature Approach with Data Augmentation for Speech Emotion Recognition using Deep Learning
BT  - Proceedings of the International Conference on Computational Innovations and Emerging Trends (ICCIET- 2024)
PB  - Atlantis Press
SP  - 835
EP  - 856
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-471-6_80
DO  - 10.2991/978-94-6463-471-6_80
ID  - Priyadarshini2024
ER  -