A review of speaker identification methods with emphasis on new approaches

Ali Azam; Masoumeh  Shafieian

doi:10.63053/ijset.63

Authors

Ali Azam Master student of Islamic Republic of Iran Broadcasting University,Tehran,Iran
Masoumeh Shafieian Member of the Faculty of Broadcasting University of the Islamic Republic of Iran

DOI:

https://doi.org/10.63053/ijset.63

Keywords:

Speaker Identification, Deep Learning, Speech Processing

Abstract

Speaker identification is one of the important and practical challenges in the field of speech processing, which plays a significant role in security, voice authentication, and intelligent systems. This article examines the new methods of speaker identification and analyzes the recent developments in this field. The main focus of the paper is on the introduction and analysis of modern deep learning techniques, including convolutional neural networks (CNN) and hybrid models capable of extracting and analyzing more complex features of the speech signal. These methods have many applications in identifying the speaker independent of the text and in noisy or non-ideal conditions. Finally, remaining challenges, existing limitations, and future research directions for the development of more accurate and stable systems are reviewed. This study can help researchers and developers in a better direction in this field.

References

Ali, Hazrat, Son N Tran, Emmanouil Benetos, and Artur S d’Avila Garcez. 2018. “Speaker Recognition with Hybrid Features from a Deep Belief Network.” Neural Computing and Applications 29: 13–19.

Almaadeed, Noor, Amar Aggoun, and Abbes Amira. 2015. “Speaker Identification Using Multimodal Neural Networks and Wavelet Analysis.” Iet Biometrics 4(1): 18–28.

An, Nguyen Nang, Nguyen Quang Thanh, and Yanbing Liu. 2019. “Deep CNNs With Self-Attention for Speaker Identification.” IEEE Access 7(c): 85327–37.

Daqrouq, Khaled, and Tarek A Tutunji. 2015. “Speaker Identification Using Vowels Features through a Combined Method of Formants, Wavelets, and Neural Network Classifiers.” Applied Soft Computing 27: 231–39.

El-Moneim, Samia Abd et al. 2020. “Text-Independent Speaker Recognition Using LSTM-RNN and Speech Enhancement.” Multimedia Tools and Applications 79: 24013–28.

Farsiani, Shabnam, Habib Izadkhah, and Shahriar Lotfi. 2022. “An Optimum End-to-End Text-Independent Speaker Identification System Using Convolutional Neural Network.” Computers and Electrical Engineering 100(January 2021): 107882. https://doi.org/10.1016/j.compeleceng.2022.107882.

Fayyazi, Hossein, and Yasser Shekofteh. 2023. “IIRI-Net: An Interpretable Convolutional Front-End Inspired by IIR Filters for Speaker Identification.” Neurocomputing 558(July): 126767. https://doi.org/10.1016/j.neucom.2023.126767.

Hajibabaei, Mahdi, and Dengxin Dai. 2018. “Unified Hypersphere Embedding for Speaker Recognition.” arXiv preprint arXiv:1807.08312. http://arxiv.org/abs/1807.08312.

Jahangir, Rashid et al. 2020. “Text-Independent Speaker Identification through Feature Fusion and Deep Neural Network.” IEEE Access 8: 32187–202.

Karu, Martin, and Tanel Alumäe. 2018. “Weakly Supervised Training of Speaker Identification Models.” Speaker and Language Recognition Workshop, ODYSSEY 2018: 24–30.

Salvati, Daniele, Carlo Drioli, and Gian Luca Foresti. 2023. “A Late Fusion Deep Neural Network for Robust Speaker Identification Using Raw Waveforms and Gammatone Cepstral Coefficients.” Expert Systems with Applications 222(February): 119750. https://doi.org/10.1016/j.eswa.2023.119750.

Soleymanpour, Mohammad, and Hossein Marvi. 2017. “Text-Independent Speaker Identification Based on Selection of the Most Similar Feature Vectors.” International Journal of Speech Technology 20: 99–108.

Ye, Feng, and Jun Yang. 2021. “A Deep Neural Network Model for Speaker Identification.” Applied Sciences (Switzerland) 11(8).