A Multi-Feature Approach with Data Augmentation for Speech Emotion Recognition using Deep Learning
- DOI
- 10.2991/978-94-6463-471-6_80How to use a DOI?
- Keywords
- Convolutional Neural Networks; CNN; Deep Learning; MFCC; Mel-Frequency Cepstral Coefficients; Zero- Crossing Rate; ZCR; Root Mean Square Energy; RMS; RAVDEESS; Crema-D; TESS; SAVEE
- Abstract
This research project explores building a speech emotion recognition system using Convolutional Neural Networks (CNNs). We leverage multiple datasets like RAVDEESS, Crema-D, TESS, and SAVEE, which contain audio recordings labeled with emotions (happy, sad, angry, etc.). After meticulously converting these labels to human- readable descriptions, we explore the data's emotional distribution. To prepare the data for the CNN, we extract Mel-Frequency Cepstral Coefficients (MFCCs) that capture how humans perceive speech, along with Zero-Crossing Rate (ZCR) and Root Mean Square Energy (RMS) for additional information. While this work focuses on data preparation and feature extraction, future efforts will involve building and training a CNN model to predict emotions based on these features. The trained model's performance will be evaluated using metrics like accuracy, paving the way for deployment in real-world applications where understanding emotions in speech is valuable.
- Copyright
- © 2024 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - M. Asha Priyadarshini AU - B. Lakshmi Satwika Bai AU - N. V. Nagendra Reddy AU - K. Nagendra Babu AU - K. Pratap PY - 2024 DA - 2024/07/30 TI - A Multi-Feature Approach with Data Augmentation for Speech Emotion Recognition using Deep Learning BT - Proceedings of the International Conference on Computational Innovations and Emerging Trends (ICCIET- 2024) PB - Atlantis Press SP - 835 EP - 856 SN - 2352-538X UR - https://doi.org/10.2991/978-94-6463-471-6_80 DO - 10.2991/978-94-6463-471-6_80 ID - Priyadarshini2024 ER -