Speaker identification features extraction methods: A systematic review

Sreenivas Sremath Tirumala; Seyed Reza Shahamiri; Abhimanyu Singh Garhwal; Ruili Wang

doi:10.1016/j.eswa.2017.08.015

Speaker identification features extraction methods: A systematic review

Sreenivas Sremath Tirumala, Seyed Reza Shahamiri, Abhimanyu Singh Garhwal, Ruili Wang

Research output: Journal Publication › Review article › peer-review

154 Citations (Scopus)

Abstract

Speaker Identification (SI) is the process of identifying the speaker from a given utterance by comparing the voice biometrics of the utterance with those utterance models stored beforehand. SI technologies are taken a new direction due to the advances in artificial intelligence and have been used widely in various domains. Feature extraction is one of the most important aspects of SI, which significantly influences the SI process and performance. This systematic review is conducted to identify, compare, and analyze various feature extraction approaches, methods, and algorithms of SI to provide a reference on feature extraction approaches for SI applications and future studies. The review was conducted according to Kitchenham systematic review methodology and guidelines, and provides an in-depth analysis on proposals and implementations of SI feature extraction methods discussed in the literature between year 2011 and 2106. Three research questions were determined and an initial set of 535 publications were identified to answer the questions. After applying exclusion criteria 160 related publications were shortlisted and reviewed in this paper; these papers were considered to answer the research questions. Results indicate that pure Mel-Frequency Cepstral Coefficients (MFCCs) based feature extraction approaches have been used more than any other approach. Furthermore, other MFCC variations, such as MFCC fusion and cleansing approaches, are proven to be very popular as well. This study identified that the current SI research trend is to develop a robust universal SI framework to address the important problems of SI such as adaptability, complexity, multi-lingual recognition, and noise robustness. The results presented in this research are based on past publications, citations, and number of implementations with citations being most relevant. This paper also presents the general process of SI.

Original language	English
Pages (from-to)	250-271
Number of pages	22
Journal	Expert Systems with Applications
Volume	90
DOIs	https://doi.org/10.1016/j.eswa.2017.08.015
Publication status	Published - 30 Dec 2017
Externally published	Yes

Keywords

Feature extraction
Kitchenham systematic review
MFCC
Speaker identification
Speaker recognition

ASJC Scopus subject areas

General Engineering
Computer Science Applications
Artificial Intelligence

Access to Document

10.1016/j.eswa.2017.08.015

Cite this

@article{bbbddd0cadd6402f874187daea523251,

title = "Speaker identification features extraction methods: A systematic review",

abstract = "Speaker Identification (SI) is the process of identifying the speaker from a given utterance by comparing the voice biometrics of the utterance with those utterance models stored beforehand. SI technologies are taken a new direction due to the advances in artificial intelligence and have been used widely in various domains. Feature extraction is one of the most important aspects of SI, which significantly influences the SI process and performance. This systematic review is conducted to identify, compare, and analyze various feature extraction approaches, methods, and algorithms of SI to provide a reference on feature extraction approaches for SI applications and future studies. The review was conducted according to Kitchenham systematic review methodology and guidelines, and provides an in-depth analysis on proposals and implementations of SI feature extraction methods discussed in the literature between year 2011 and 2106. Three research questions were determined and an initial set of 535 publications were identified to answer the questions. After applying exclusion criteria 160 related publications were shortlisted and reviewed in this paper; these papers were considered to answer the research questions. Results indicate that pure Mel-Frequency Cepstral Coefficients (MFCCs) based feature extraction approaches have been used more than any other approach. Furthermore, other MFCC variations, such as MFCC fusion and cleansing approaches, are proven to be very popular as well. This study identified that the current SI research trend is to develop a robust universal SI framework to address the important problems of SI such as adaptability, complexity, multi-lingual recognition, and noise robustness. The results presented in this research are based on past publications, citations, and number of implementations with citations being most relevant. This paper also presents the general process of SI.",

keywords = "Feature extraction, Kitchenham systematic review, MFCC, Speaker identification, Speaker recognition",

author = "Tirumala, {Sreenivas Sremath} and Shahamiri, {Seyed Reza} and Garhwal, {Abhimanyu Singh} and Ruili Wang",

note = "Publisher Copyright: {\textcopyright} 2017 Elsevier Ltd",

year = "2017",

month = dec,

day = "30",

doi = "10.1016/j.eswa.2017.08.015",

language = "English",

volume = "90",

pages = "250--271",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd.",

}

TY - JOUR

T1 - Speaker identification features extraction methods

T2 - A systematic review

AU - Tirumala, Sreenivas Sremath

AU - Shahamiri, Seyed Reza

AU - Garhwal, Abhimanyu Singh

AU - Wang, Ruili

PY - 2017/12/30

Y1 - 2017/12/30

N2 - Speaker Identification (SI) is the process of identifying the speaker from a given utterance by comparing the voice biometrics of the utterance with those utterance models stored beforehand. SI technologies are taken a new direction due to the advances in artificial intelligence and have been used widely in various domains. Feature extraction is one of the most important aspects of SI, which significantly influences the SI process and performance. This systematic review is conducted to identify, compare, and analyze various feature extraction approaches, methods, and algorithms of SI to provide a reference on feature extraction approaches for SI applications and future studies. The review was conducted according to Kitchenham systematic review methodology and guidelines, and provides an in-depth analysis on proposals and implementations of SI feature extraction methods discussed in the literature between year 2011 and 2106. Three research questions were determined and an initial set of 535 publications were identified to answer the questions. After applying exclusion criteria 160 related publications were shortlisted and reviewed in this paper; these papers were considered to answer the research questions. Results indicate that pure Mel-Frequency Cepstral Coefficients (MFCCs) based feature extraction approaches have been used more than any other approach. Furthermore, other MFCC variations, such as MFCC fusion and cleansing approaches, are proven to be very popular as well. This study identified that the current SI research trend is to develop a robust universal SI framework to address the important problems of SI such as adaptability, complexity, multi-lingual recognition, and noise robustness. The results presented in this research are based on past publications, citations, and number of implementations with citations being most relevant. This paper also presents the general process of SI.

AB - Speaker Identification (SI) is the process of identifying the speaker from a given utterance by comparing the voice biometrics of the utterance with those utterance models stored beforehand. SI technologies are taken a new direction due to the advances in artificial intelligence and have been used widely in various domains. Feature extraction is one of the most important aspects of SI, which significantly influences the SI process and performance. This systematic review is conducted to identify, compare, and analyze various feature extraction approaches, methods, and algorithms of SI to provide a reference on feature extraction approaches for SI applications and future studies. The review was conducted according to Kitchenham systematic review methodology and guidelines, and provides an in-depth analysis on proposals and implementations of SI feature extraction methods discussed in the literature between year 2011 and 2106. Three research questions were determined and an initial set of 535 publications were identified to answer the questions. After applying exclusion criteria 160 related publications were shortlisted and reviewed in this paper; these papers were considered to answer the research questions. Results indicate that pure Mel-Frequency Cepstral Coefficients (MFCCs) based feature extraction approaches have been used more than any other approach. Furthermore, other MFCC variations, such as MFCC fusion and cleansing approaches, are proven to be very popular as well. This study identified that the current SI research trend is to develop a robust universal SI framework to address the important problems of SI such as adaptability, complexity, multi-lingual recognition, and noise robustness. The results presented in this research are based on past publications, citations, and number of implementations with citations being most relevant. This paper also presents the general process of SI.

KW - Feature extraction

KW - Kitchenham systematic review

KW - MFCC

KW - Speaker identification

KW - Speaker recognition

UR - http://www.scopus.com/inward/record.url?scp=85027973346&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2017.08.015

DO - 10.1016/j.eswa.2017.08.015

M3 - Review article

AN - SCOPUS:85027973346

SN - 0957-4174

VL - 90

SP - 250

EP - 271

JO - Expert Systems with Applications

JF - Expert Systems with Applications

ER -

Speaker identification features extraction methods: A systematic review

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this