Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
In digital signal processing techniques, the first step is the pattern recognition problem, which is essentially solved using recognition system based on speaker method. This method is based on the individual information of the utterer(speaker) stored in the form of speech waves and recognizes the speaker automatically based on the information available. It is important to process the uttered signal for fast and accurate speaker recognition system. It involves authentication of a speaker from a large ensemble of possible speakers. In this paper we implemented feature extraction of speech signal using Mel Frequency Cepstral analysis (MFCC) and the result of MFCC analysis are a series of vector characteristics, used to build Vector Quantization (VQ) codebook.
In this paper, automatic speaker recognition system is implemented by combining feature extraction and feature matching technique. Feature extraction method that is implemented by the Mel Frequency Cepstral Coefficients (MFCC). The Vector Quantization (VQ) is done using k-means algorithm that is used for the feature matching technique due to high accuracy but its simplicity. A speaker recognition experiments was performed using a 5talkers (3 male and 2 female) with different conditions of Bangla sentence utterances. So, in this proposal achieved 96% efficiency for text dependent or independent speaker recognition system with 60 -90 seconds duration test voice sample.
2007
Abstract:- Automatic speaker recognition is a field of study attributed in identifying a person from a spoken phrase. The technique makes it possible to use the speaker’s voice to verify their identity and control access to the services such as biometric security system, voice dialing, telephone banking, telephone shopping, database access services, information services, voice mail, and security control for confidential information areas and remote access to the computers. This thesis represents a development of a Matlab based text dependent speaker recognition system. Mel Frequency Cepstrum Coefficient (MFCC) Method is used to extract a speaker’s discriminative features from the mathematical representation of the speech signal. After that Vector Quantization with VQ-LBG Algorithm is used to match the feature.
2019
The paper proposes a Speaker Recognition system which does the task of validating a user’s claimed identity using characteristics extracted from their voices. It is one of the most useful and popular biometric recognition techniques in the world especially related to areas in which security is a major concern. A direct analysis and synthesizing of the complex voice signal is due to too much information contained in the signal. Therefore, the digital signal processes, Feature Extraction and Feature Matching were introduced to represent the voice signal. MelFrequency Cepstral Coefficients (MFCC) were extracted from the speech signal which were used to represent each speaker and recognition was carried out using weighted Euclidean distance. MATLABR2017b platform was used to implement feature extraction process. Index Terms – Co Feature matching, Feature Extraction, MFCC, Euclidean distance.
IJSRD, 2013
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
2012
Voice recognition is basically divided into two-classification: Voice recognition and Voice identification and it is the method of automatically identify who is speaking on the basis of individual information integrated in speech waves. Voice recognition is widely applicable in use of speaker"s voice to verify their identity and control access to services such as banking by telephone, database access services, voice dialing telephone shopping, information services, voice mail, security control for secret information areas. Another important application of Voice recognition technology is for forensic purposes.In the study, the effectiveness of combinations of cepstral features, channel compensation techniques, and different local distances in the Dynamic Time Warping (DTW) algorithm is experimentally evaluated in the text-dependent speaker identification task. The training and the testing has been done with noisy telephone speech (short phrases in Bulgarian with length of about ...
The present study was conducted to evaluate the accuracy affecting factors of a Mel-Frequency Cepstral Coefficients (MFCC) and Vector Quantization (VQ) based speaker recognition system. This investigation analyses the factors that affecting recognition accuracy using speech signal from day to day life in surrounding environments. It was studied the mismatch affects of text-dependency, voice sample length, speaking language, speaking style, mimicry, the quality of microphone, utterance sample quality and surrounding noise. The corpuses of 10 people of 20 utterance subjects were collected which were indicate that any mismatch degrades recognition accuracy. It was found that most dominating factors that degrades the accuracy of speaker recognition systems were surrounding noise, quality of microphone by which voice sample were collected, disguise, and degrading of the sample rate and quality. Speech-related factors and sample length were less critical.
In this paper, an improved strategy for automated text dependent speaker recognition system has been proposed in noisy environment. The preprocessing of speaker signal started with eliminate the background noise. The next step is signal filtering and features extraction using cepstrum coefficients method, this extracted features can be used to by the enhanced LBG for vector quantization algorithm for speaker recognition, such that the specified speaker can be determined by matching the speaker to be tested with in stored codebook in database. And finally select correct speaker that have the lesser Euclidean distance. The speech feature extraction was based on a dataset of 175 different samples collected from 25 different speakers The results of the proposed system approved with good recognition ratio of speaker identification with maximum accuracy about 96.2% for database with close set of selected words contains the most used phonemes. Also the results of experiments show that recognition accuracy increased with frames overlapping. [Hussein Lafta Attiya, Ali Yakoob Yousif. Mel frequency Cepstrum Coefficients and Enhanced LBG algorithm for Speaker Recognition. Researcher 2015;7(1):19-25]. (ISSN: 1553-9865). http://www.sciencepub.net/researcher. 4
Engineering and Technology Journal
Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.
Journal of Measurements, Electronics, Communications, and Systems, 2020
Voice is one of the parameters in the identification process of a person. Through the voice, information will be obtained such as gender, age, and even the identity of the speaker. Speaker recognition is a method to narrow down crimes and frauds committed by voice. So that it will minimize the occurrence of faking one's identity. The Method of Mel Frequency Cepstrum Coefficient (MFCC) can be used in the speech recognition system. The process of feature extraction of speech signal using MFCC will produce acoustic speech signal. The classification, Hidden Markov Models (HMM) is used to match unidentified speaker’s voice with the voices in database. In this research, the system is used to verify the speaker, namely 15 text dependent in Indonesian. On testing the speaker with the same as database, the highest accuracy is 99,16%.
Advances in Intelligent Systems and Computing, 2012
The proposed work provides a description of an Automatic Speaker Recognition System (ASR). It particularly documents all the stages involved in the proposed ASR system starting from the preprocessing stage to the decision making stage. The main aim of this work is to achieve a system with high robustness and user friendly. Voice samples from three different users are used as acoustic material. Feature extraction is done by computing Mel Frequency Cepstral Coefficients (MFCC) which is used to create reference template. For the purpose of feature matching, Dynamic Time Warping (DTW) algorithm is used wherein DTW distance is computed between the test signal and the reference signal. Decision is made by comparing the distance with a predefined threshold value.
Information Technology And Control, 2020
One extension of feature vector for automatic speaker recognition is considered in this paper. The starting feature vector consisted of 18 mel-frequency cepstral coefficients (MFCCs). Extension was done with two additional features derived from the spectrum of the speech signal. The main idea that generated this research is that it is possible to increase the efficiency of automatic speaker recognition by constructing a feature vector which tracks a real perceived spectrum in the observed speech. Additional features are based on the energy maximums in the appropriate frequency ranges of observed speech frames. In experiments, accuracy and equal error rate (EER) are compared in the case when feature vectors contain only 18 MFCCs and in cases when additional features are used. Recognition accuracy increased by around 3%. Values of EER show smaller differentiation but the results show that adding proposed additional features produced a lower decision threshold. These results indicate t...
International Journal of Electrical and Computer Engineering (IJECE), 2022
In this research, we present an automatic speaker recognition system based on adaptive orthogonal transformations. To obtain the informative features with a minimum dimension from the input signals, we created an adaptive operator, which helped to identify the speaker's voice in a fast and efficient manner. We test the efficiency and the performance of our method by comparing it with another approach, mel-frequency cepstral coefficients (MFCCs), which is widely used by researchers as their feature extraction method. The experimental results show the importance of creating the adaptive operator, which gives added value to the proposed approach. The performance of the system achieved 96.8% accuracy using Fourier transform as a compression method and 98.1% using Correlation as a compression method.
Automatic Speaker Identification technology has recently been implemented in several of commercial areas successfully. Speaker identification comes under Speaker recognition and is gaining significance for voice based biometrics. It is used in appliances that understand voice commands, provides security to confidential information,etc. In this paper we have built reference model for each speaker using the acoustic features .Testing has been done by comparing the features of the test sample with the reference model.We have used MFCC technique for feature extraction .For creating the reference model Vector Quantization( VQ) with LGB (Linde, Buzo, and Gray)was used. For testing VQ distortion was used.The system achieved 85% accuracy.
Speaker recognition (SR) is a dynamic biometric task. SR is a multidisplinary problem that encompasses many aspects of human speech, including speech recognition, language recognition, and speech accents. This technique makes it possible to use the speaker’s voice to verify his/her identity and provide controlled access to services. The Mel-frequency extraction method is leading approach for speech feature extraction. In this thesis, a new algorithm has been proposed which incorporates FVQ and DCT based MFCC feature extraction method. The proposed system will be improved the performance of SR through MFCC and FVQ methods. The FVQ performance result will be compared with K means quantization in terms of EER
—The performance of any speaker recognition system depends on the duration of the speech samples. The higher the number of feature vectors is, the better is the efficiency. A major contribution of this paper is in enhancing the identification accuracy of the speaker recognition system through minimization of the objective function and associated distortions. With nonlinear mapping, the sectional set fuzzy vector quantization with novel norm is utilized here as usual to form speaker's model in the high-dimensional feature space. However, during feature extraction, the traditional triangular shaped bins have been replaced by Gaussian shaped filter (GF) and Tukey filter (TF) for calculating the mel frequency cepstral coefficients (MFCCs). The paper presents experimental evaluation of three modeling techniques, viz. fuzzy c-means, fuzzy vector quantization (FVQ)2 and novel fuzzy vector quantization (NFVQ). On simulation, the NFVQ shows significant improvement in performance over fuzzy c-means and FVQ2. The experimental evidence demonstrates that for two seconds of training and one second of testing data, the efficiency of the NFVQ, with a minimum objective function of = 0.073 and distortion D = 4.334, for a set of 100 speakers chosen from the Texas Instruments and Massachusetts Institute of Technology (TIMIT) database and self-collected database is 98.8% and 98.1%, respectively. Index Terms—Gaussian filter (GF); mel frequency cepstral coefficients (MFCCs); novel fuzzy vector quantization (NFVQ); triangular filter (TF); Tukey filter.
2012 International Conference on Communication, Information & Computing Technology (ICCICT), 2012
Speaker identification system is one of the applications of biometric using voice signal. In this paper we have implemented a speaker recognition system using a combination of Mel Frequency Capestral Coefficients (MFCC) & Kekre's Median Codebook Generation Algorithm (KMCG). The MFCC algorithm is used for feature extraction while the KMCG algorithm plays important role in code book generation and feature matching. For implementation simplicity the system is built as a text dependent system, i.e. common text used by all users. KMCG algorithm provides implementation simplicity along with high level of accuracy.
Lecture Notes in Computer Science, 2008
Abstract. Speaker Recognition is the process of identifying a speaker by analyzing spectral shape of the voice signal. This is done by extracting & matching the feature of voice signal. Mel-frequency Cepstrum Co-efficient (MFCC) is the feature extraction technique in which we will get ...
Speech processing is emerged as one of the important application area of digital signal processing. Various fields for research in speech processing are speech recognition, speaker recognition, speech synthesis, speech coding etc. The objective of automatic speaker recognition is to extract, characterize and recognize the information about speaker identity. Feature extraction is the first step for speaker recognition. Many algorithms are suggested/developed by the researchers for feature extraction. In this work, the Mel Frequency Cepstrum Coefficient (MFCC) feature has been used for designing a text dependent speaker identification system. BPNN is used for identification of speaker after training the feature set from MFCC. Some modifications to the existing technique of MFCC for feature extraction are also suggested to improve the speaker recognition efficiency. Information from speech recognition can be used in various ways in state-of-the-art speaker recognition systems. This includes the obvious use of recognized words to enable the use of text-dependent speaker modeling techniques when the words spoken are not given. Furthermore, it has been shown that the choice of words and phones itself can be a useful indicator of speaker identity. Also, recognizer output enables higher-level features, in particular those related to prosodic properties of speech.
International Journal of Speech Technology, 2012
In this paper, a new and novel Automatic Speaker Recognition (ASR) system is presented. The new ASR system includes novel feature extraction and vector classification steps utilizing distributed Discrete Cosine Transform (DCT-II) based Mel Frequency Cepstral Coefficients (MFCC) and Fuzzy Vector Quantization (FVQ). The ASR algorithm utilizes an approach based on MFCC to identify dynamic features that are used for Speaker Recognition (SR). A series of experiments were performed utilizing three different feature extraction methods: (1) conventional MFCC; (2) Delta-Delta MFCC (DDMFCC); and (3) DCT-II based DDMFCC. The experiments were then expanded to include four classifiers: (1) FVQ; (2) K-means Vector Quantization (VQ); (3) Linde, Buzo and Gray VQ; and (4) Gaussian Mixed Model (GMM). The combination of DCT-II based MFCC, DMFCC and DDMFCC with FVQ was found to have the lowest Equal Error Rate for the VQ based classifiers. The results found were an improvement over previously reported non-GMM methods and approached the results achieved for the computationally expensive GMM based method. Speaker verification tests carried out highlighted the overall performance improvement for the new ASR system. The National Institute of Standards and Technology Speaker Recognition Evaluation corpora was used to provide speaker source data for the experiments.
Speech feature extraction is the most significant step in any Automatic speaker recognition system. In the last 60 years a lot of research has gone into parametric representation of these speech features. Several techniques are currently being used for Automatic Speaker Recognition. Yet Automatic Speaker Recognition still remains a confront mainly due to variations in speaker's vocal tract with time and health, varying environmental conditions, disparities in the behavior and quality of speech recorders etc. MFCC is a extensively used technique in Automatic speaker recognition. In this paper the performance of MFCC technique was evaluated in a quiet environment. A speaker database containing 30 male and 30 female speakers was created. Two separate experiments were conducted for the performance evaluation of MFCC technique when applied to K means clustering. In the first case the speech features were directly matched. In the second case a VQ codebook was created by clustering the training features of these 60 speakers. A distortion easure based on the minimum Euclidean distance was used for speaker recognition. The failure rate of speaker recognition in first ase was found to be was found to be 10% while in the second case as found to be 14%. Matlab-7.10.0 was used for this study
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.