Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Proceedings of “Verificatori Biometrici” Workshop, organized by Technical University of Cluj-Napoca, Universitas Napocensis Babes-Bolyai, Universitas Medicinae et Farmaciae Napocensis and CNCSIS, Cluj-Napoca, Romania, May
In this paper the GMM speaker model was analyzed from the viewpoint of its phonetic content. Phoneme distribution among clusters represented by Gaussians was studied. Special speaker models were also created using only a part of the training data, in order to identify the most valuable part of speech, for the purpose of speaker identification. Key words: Speaker Identification, Gaussian Mixture Models, Phonetic Analysis
This paper presents the performance of a text independent speaker verification system using Gaussian Mixture Model (GMM) for the Brazilian Portuguese. The Gaussian compo-nents of the GMM statistically represent the spectral char-acteristics of the speaker, leading to an effective speaker recognition system. The main goal here is a detailed evalua-tion of the parameters used by the GMM such as the number of Gaussian mixtures, the amount of time for training and testing. Aiming at the definition of the best set of features for a reasonable response, this work helps the comprehen-sion of the model and gives insights for further investiga-tion. We have used 36 speakers in the experiments, all mod-eled with 15 mel-cepstral coefficients. For 32 Gaussians, 60 seconds of training, and 30 seconds of testing, the sys-tem has no failure for a reasonably clean speech signal. The results have shown that the higher the amount of time for training and testing, the better are the results for a give...
This paper provides an overview of Gaussian Mixture Model (GMM) and its component of speech signal. During the earlier period it has been revealed that Gaussian Mixture Model is very much appropriate for voice modeling in speaker recognition system. For Speaker recognition, Gaussian mixture model is an essential appliance of statistical clustering. The task effortlessly performed by humans is not effortless for machine or computers such as voice recognition or face recognition so for this function speaker recognition technology makes available a solution, using this technology the computers/machines outperforms than humans.
Speaker identification is an important activity in the process of speaker diarization. We need to model the speaker by Gaussian mixture model (GMM) for speaker identification purpose. Large GMM is called as a Universal Background Model (UBM) which is adapted into each speaker model for speaker identification purpose. This paper focuses on speech clustering for speaker diarization. The speaker diarization includes the steps speech segmentation and the process of speech clustering. In speech segmentation, the features are extracted for each speech segment which is converted into Mel-Frequency-Cepstral-Coefficients (MFCC). Each speech segment is modeled by UBM adaptation. The relevant speech segments are grouped as speech clusters. This paper describes the speech segmentation, UBM adaptation, and speech clustering technique.
International Conference on Electronics Representation and Algorithm (ICERA), 2021
One of the most common methods used in the process of identifying speakers is the Gaussian Mixture Model (GMM) method. The quality of GMM depends on the method selected to train the Gaussian. One method that the researcher has chosen is to use k-Means. In this study, an evaluation process was performed on the k-Means GMM using three centroid initialization methods: randomization, seeding and density analysis. The application of seeding uses the k-Means method, whereas the application of density analysis uses the histogram method. We applied two evaluation criteria, namely the complexity of the training process and the accuracy of the speaker identification process. Experiments were conducted over three types of voice test duration: 2, 4 and 6 seconds. We also used nine types of Gaussian components, ranging from 4 to 20 components, with an increasing scale of 2+n. Our proposed method using density analysis has a clustering process time of 33.7% lower, but with the highest accuracy of 95.5%.
2002
Analysis and modeling of speaker variability is important to help understand in-depth inter-speaker variances and to enhance current speech/speaker recognition system. In this paper we introduce adapted Gaussian mixture model (GMM) based speaker representation for the task. Two powerful multivariate statistical analysis methods, principal component analysis (PCA) and independent component analysis (ICA), are used to extract the sources of dominant speaker variability. In addition, analysis of variance (ANOVA) is adopted to evaluate the dominance of a factor in a certain principal/independent component. Further, the generalization ability of our method is investigated by experiments.
and speaker verification applications. This paper presents a study of the model parameters effects in a state-of-the-art adapted GMM based text-independent speaker verification system. The system is benefited from likelihood ratio test for verification, using adapted GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. Fast scoring and normalization of scores was used which is a very important criterion to deal with real-world data. The system performance was evaluated using the detection error trade-off (DET) curves and decision cost function (DCF). The effects of model order, the training and test speeches lengths were studied experimentally.
Bio-Inspired Applications of Connectionism
Speaker recognition is a term which is most popular in biometric recognition technique that tends to identify and verify a speaker from his/her speech data. Speaker recognition system uses mechanism to recognize the speaker by using the speaker's speech signal. It is mainly useful in applications where security is the main and important one. Generally, speech information are recorded though the air microphone and these speech information collected from various speakers are used as input for the speaker recognition system as they are prone to environmental background noise, the performance is enhanced by integrating an additional speech signal collected through a throat microphone along with speech signal collected from standard air microphone. The resulting signal is very similar to normal speech, and is not affected by environmental background noise. This paper is mainly focused on extraction of the Mel frequency Cepstral Coefficients (MFCC) feature from an air speech signal and throat speech signal to built Gaussian Mixture Model(GMM) based closed-set text independent speaker recognition systems and to depict the result based on identification.
Internation journal of science and innovation engineering and technology, 2018
The speaker verification is a process of verifying the identity of the claimants. It performs one to one comparison between a newly input voice print and the voice print for the claimed identity that is stored in the database. In this paper, linear predictive coding coefficient has been used for formant detection. The peak frequencies in the frequency response of vocal tract are formants, which is being detected and compared for verification. Data base of twenty persons having five samples per person including male and female has been created for analysis of results. The System (Speaker verification) is usually employed as a "gatekeeper" in order to provide access to a secure system. These systems operate with the user's knowledge and typically require the user's cooperation. The developed system uses the MATLAB.
This paper describes a speaker recognition system based on feature extraction utilizing the constrained maximum likelihood linear regression (CMLLR) speaker adaptation, while using Gaussian mixture models (GMM) to model the speaker and background models. For the input acoustic signals, the cepstral features are derived to highlight the differences between test and training utterances. The CLSU dataset is used to test the efficiency and performance of the proposed CMLLR, Support Vector Machine, and GMM methods for modeling the speaker's voice by characterizing the speaker features.
Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513), 2004
Raising the performance of the systems identification speaker still constitutes the object of several research. Recently, we have proposed an approach which jointly exploits the information of the vocal tract and the glottis source. The approach synchronously takes into account the correlation between the two sources of information. The proposed theoretical model which consists of using a joint law is presented in this work. Some restrictions and simplifications were taken into account to show the significance of this approach in practical way. The fundamental frequency and the MFCC coefficients (Mel Frequency Cepstrum Coefficients) were used to represent the information of the source and the vocal tract, respectively. The probability density of the source, in particular, was considered to obey a uniform law. Tests were carried out with only the women speaker coming from de speech telephony database (SPIDRE) recorded from various hand set telephones. In this article, modelling the source information is proposed by using a Gaussian Mixture Model (GMM) rather than the uniform probabilistic model. Tests are extended to all speakers of the SPIDRE database. In this respect, four systems were proposed and compared. The first is a baseline system based on the MFCC and does not use any information from the source. The second examine only the voiced segments of the vocal signal. The last two relate to the suggested approaches according to the two techniques. The source information is supposed to follow a normal distribution in one technique and a logNormal distribution in the other. With the proposed approach, the profit in performance increases by 10,5% for the women, 7% for the men and 8% for all speakers.
Proc. International Symp. of …
… , 2008. IST 2008. …, 2008
In this paper, we propose a hierarchical mixture clustering method and investigate its application for complexity reduction of a GMM based speaker identification system. We show that by using GMM-HMC one can cluster speakers more accurately than that of a sorted GMM with the same acceleration rate. The system was tested on a universal background model-Gaussian mixture model with KL-divergence as the distance measure. While the proposed system's performance is slightly inferior to the baseline system, its comparatively smaller computational load provides the potential to develop systems with higher performance.
EURASIP journal on advances in signal processing
Gaussian mixture models (GMMs) are recently employed to provide a robust technique for speaker identification. The determination of the appropriate number of Gaussian components in a model for adequate speaker representation is a crucial but difficult problem. This number is in fact speaker dependent. Therefore, assuming a fixed number of Gaussian components for all speakers is not justified. In this paper, we develop a procedure for roughly estimating the maximum possible model order above which the estimation of model parameters becomes unreliable. In addition, a theoretical measure, namely, a goodness of fit (GOF) measure is derived and utilized in estimating the number of Gaussian components needed to characterize different speakers. The estimation is carried out by exploiting the distribution of the training data for each speaker. Experimental results indicate that the proposed technique provides comparable results to other well-known model selection criteria like the minimum d...
In this paper, the effects of adaptation parameters selection on the performance of a postprocessing Gaussian mixture model (GMM) called GMM identifier, used in GMM based speaker verification system are studied. Experimental results show the importance of proper parameters choice in the adaptation of the post-processor GMM model. Models implemented, trained, and tested using a Farsi speech dataset with 90 speakers. Combinations of priors, means, and covariance adaptation were examined and multiple orders of GMM identifier from 4 to 128 were evaluated.
Genetic Resources and Crop Evolution, 2005
In this paper, we seek to enhance the identification performance of Gaussian Mixture Model (GMM)based speaker identification systems in the presence of a limited amount of training data and a relatively large number of speakers. The performance is characterized by the identification accuracy, the identification time, and the model complexity. A new model order selection technique based on the Goodness of Fit (GOF) statistical test is proposed in order to increase the identification accuracy. This technique has shown to outperform other well known model order selection techniques like the Minimum Description Length (MDL) and the Akaike Information Criterion (AIC) in terms of the identification accuracy and the robustness against telephone channel degradation effects. In addition, the identification time is decreased by adapting the Linear Discriminative Analysis (LDA) feature extraction technique to fit our basic assumption of asymmetric multimodal distribution of the training data of each speaker. This modification results in a large decrease in the identification time with a little effect on the identification accuracy.
2009
Systems that automatically recognize a speaker are increasingly important in humancomputer interaction because speech communication has always been and will continue to be the dominant mode of human social bonding and information exchange. This paper investigates the use of Gaussian mixture models (GMMs) for robust text-independent speaker identification. The experiments performed in this research examine several aspects and parameters of GMM usage: algorithmic issues, amount of training data, modeling different languages, and small and large population performance. We found that increasing the amount of training data and decreasing the number of speakers improved the accuracy of text-independent speaker identification using statistical models based on Gaussian mixture models. There also appears to be a maximum number of Gaussian mixture components needed to adequately model speakers and achieve good identification performance for different amounts of training data.
2015
State-of-the-art of speaker recognition is fully advanced nowadays. There are various well-known technologies used to process voice, including Gaussian mixture models. The paper presents our work on speaker identification from his voice. In our experiment we first extract key features from a speech signal using VOICEBOX [1]toolbox in MATLAB. These features are represented by a matrix of mel frequency cepstral coefficients (MFCC). Then, applying MSR Identity Toolbox, we build an identity for each person enrolled in our system using statistical Gaussian Mixture Model Universal Background Model (GMM-UBM) and features extracted from speech signals. Universal Background Model improves Gaussian Mixture Model statistical computation for decision logic in speaker verification task. As a corpus, we used TIMIT database for our experiments. Finally, we compared the recognition accuracy for several different scenarios of our experiments.
Speaker and Language Recognition …, 2006
This article presents a new approach using the discrimination power of Support Vectors Machines (SVM) in combination with Gaussian Mixture Models (GMM) for Automatic Speaker Verification (ASV). In this combination SVMs are applied in the GMM model space. Each point of this space represents a GMM speaker model. The kernel which is used for the SVM allows the computation of a similarity between GMM models. It was calculated using the Kullback-Leibler (KL) divergence. The results of this new approach show a clear improvement compared to a simple GMM system on the NIST2005 Speaker Recognition Evaluation primary task.
Digital signal processing, 2000
In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented.
IEEE transactions on cybernetics, 2013
This paper presents three novel methods for speaker identification of which two methods utilize both the continuous density hidden Markov model (HMM) and the generalized fuzzy model (GFM), which has the advantages of both Mamdani and Takagi-Sugeno models. In the first method, the HMM is utilized for the extraction of shape-based batch feature vector that is fitted with the GFM to identify the speaker. On the other hand, the second method makes use of the Gaussian mixture model (GMM) and the GFM for the identification of speakers. Finally, the third method has been inspired by the way humans cash in on the mutual acquaintances while identifying a speaker. To see the validity of the proposed models [HMM-GFM, GMM-GFM, and HMM-GFM (fusion)] in a real-life scenario, they are tested on VoxForge speech corpus and on the subset of the 2003 National Institute of Standards and Technology evaluation data set. These models are also evaluated on the corrupted VoxForge speech corpus by mixing wit...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.