Proceedings of the International Conference on Applications of Machine Intelligence and Data Analytics (ICAMIDA 2022)

Speakers Identification Using Diarization Techniques

Authors
Vinod K. Pande1, *, Vijay K. Kale1
1Dr. G. Y. Pathrikar College of Computer Science and Information Technology, MGM University, Aurangabad, Maharashtra, India
*Corresponding author. Email: vinodkpande2014@gmail.com
Corresponding Author
Vinod K. Pande
Available Online 1 May 2023.
DOI
10.2991/978-94-6463-136-4_80How to use a DOI?
Keywords
Speaker Diarization; End-to-End Neural Diarization(EEND); Mel Frequency Cepstrum Coefficients (MFCC); Generative Adversarial Networks (GANs); Hidden Markov Model (HMM)
Abstract

Research work analyses speaker voice identification and voice separation development methodologies and show an overview of the findings. Several speech recognition methods, such as Mel Frequency Cepstrum Coefficients (MFCC), Vector Quantization (VQ), Hidden Markov Model (HMM), Long Short-Term Memory (LSTM), End-to-End Neural Diarization (EEND), Generative Adversarial Networks (GANs), Convolutional Neural Networks, and Audio Embeddiment, can be used for adaptive processing with multiple speakers identification in audio data. Additionally, we addressed the uses of speaker diarization, the potential for future development, and the databases used to evaluate diarization systems.

The speaker diarization method consists of seven steps, including input, front-end processing, speech activity detection, segmentation, speaker embedding, clustering post-processing, and output.

Speaker identification recognizes speakers during an audio conversion, a kind of speech recognition. Diarization of the speaker is a way of recognizing the speaker in a multi-speaker audio file. And The procedure of identifying who talks when in an audio recording is known as speaker diarization. The audio file includes information from conferences, broadcast news, and any other public gathering with many speakers.

Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the International Conference on Applications of Machine Intelligence and Data Analytics (ICAMIDA 2022)
Series
Advances in Computer Science Research
Publication Date
1 May 2023
ISBN
978-94-6463-136-4
ISSN
2352-538X
DOI
10.2991/978-94-6463-136-4_80How to use a DOI?
Copyright
© 2023 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Vinod K. Pande
AU  - Vijay K. Kale
PY  - 2023
DA  - 2023/05/01
TI  - Speakers Identification Using Diarization Techniques
BT  - Proceedings of the International Conference on Applications of Machine Intelligence and Data Analytics (ICAMIDA 2022)
PB  - Atlantis Press
SP  - 905
EP  - 915
SN  - 2352-538X
UR  - https://doi.org/10.2991/978-94-6463-136-4_80
DO  - 10.2991/978-94-6463-136-4_80
ID  - Pande2023
ER  -