Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2006, Speaker and Language Recognition …
This article presents a new approach using the discrimination power of Support Vectors Machines (SVM) in combination with Gaussian Mixture Models (GMM) for Automatic Speaker Verification (ASV). In this combination SVMs are applied in the GMM model space. Each point of this space represents a GMM speaker model. The kernel which is used for the SVM allows the computation of a similarity between GMM models. It was calculated using the Kullback-Leibler (KL) divergence. The results of this new approach show a clear improvement compared to a simple GMM system on the NIST2005 Speaker Recognition Evaluation primary task.
Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker verification. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. In this work we look into the various models (GMM-UBM and GMM-SVM) and their application to speaker verification. In this paper, features vectors, constituted by the Mel Frequency Cepstral Coefficients (MFCC) extracted from the speech signal are used to train the Gaussian mixture model (GMM) and mean vectors issued from GMM-UBM to train SVM. To fit the data around their average the cepstral mean subtraction (CMS) are applied on the MFCC. For both, GMM-UBM and GMM-SVM systems, 2048-mixture UBM is used. The verification phase was tested with Aurora database at different Signal-to-Noise Ratio (SNR) and under three noisy conditions. The experimental results showed the outperformance of GMM-SVM against GMM-UBM in speaker verification espe...
… Conference on Speech …, 2001
Current best performing speaker recognition algorithms are based on Gaussian Mixture Models (GMM). Their results are not satisfactory for all experimental conditions, especially for the mismatched (train/test) conditions. Support Vector Machine is a new and very promissing technique in statistical learning theory. Recently, this technique produced very interesting results in image processing [2], [3], [4] and for the fusion of experts in biometric authentification . In this paper we address the issue of using the Support Vector Learning technique in combination with the currently well performing GMM models, in order to improve speaker verification results.
Digital signal processing, 2000
In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally, representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented.
2001: A Speaker …, 2001
In this article we address the issue of using the Support Vector Learning technique in combination with the currently well performing Gaussian Mixture Models (GMM) for speaker verification experiments. Support Vector Machines (SVM) is a new and very promising technique in statistical learning theory. Recently this technique produced very interesting results in image processing [1] [2] [3], and for the fusion of the experts in biometric authentication .
2003
Generative Gaussian Mixture Models (GMMs) are known to be the dominant approach for modeling speech sequences in text independent speaker verification applications because of their scalability, good performance and their ability in handling variable size sequences. On the other hand, because of their discriminative properties, models like Support Vector Machines (SVMs) usually yield better performance in static classification problems and can construct flexible decision boundaries. In this paper, we try to combine these two complementary models by using Support Vector Machines to postprocess scores obtained by the GMMs. A cross-validation method is also used in the baseline system to increase the number of client scores in the training phase, which enhances the results of the SVM models. Experiments carried out on the XM2VTS and PolyVar databases confirm the interest of this hybrid approach.
2003
Support vector machines with the Fisher and score-space kernels are used for text independent speaker verification to provide direct discrimination between complete utterances. This is unlike approaches such as discriminatively trained Gaussian mixture models or other discriminative classifiers that discriminate at the frame-level only. Using the sequence-level discrimination approach we are able to achieve error-rates that are significantly better than the current state-of-the-art on the PolyVar database. § § §
2007
This paper proposes a novel approach that combines statistical models and support vector machines. A hybrid scheme which appropriately incorporates the advantages of both the generative and discriminant model paradigms is described and evaluated. Support vector machines (SVMs) are trained to divide the whole speakers' space into small subsets of speakers within a hierarchical tree structure. During testing a speech token is assigned to its corresponding group and evaluation using gaussian mixture models (GMMs) is then processed. Experimental results show that the proposed method can significantly improve the performance of text independent speaker identification task. We report improvements of up to 50% reduction in identification error rate compared to the baseline statistical model.
2002
This paper presents a performance evaluation of two classification systems for text independent speaker verification: the Gaussian Mixture Model (GMM) and the AR-Vector Model. For the GMM, ¢ ¤ £ , ¥ § ¦ , and¨Gaussians are evaluated. On the other hand, an order £ model with the Itakura symmetric distance was used for the AR-Vector. Both classification systems presented no errors when training and testing times were not smaller than ¦ © s and ¢ © s, respectively. Using ¥ § © s as the test time, the most accurate classification systems errors were between © and ¢ ¢ %. With
Internation journal of science and innovation engineering and technology, 2018
The speaker verification is a process of verifying the identity of the claimants. It performs one to one comparison between a newly input voice print and the voice print for the claimed identity that is stored in the database. In this paper, linear predictive coding coefficient has been used for formant detection. The peak frequencies in the frequency response of vocal tract are formants, which is being detected and compared for verification. Data base of twenty persons having five samples per person including male and female has been created for analysis of results. The System (Speaker verification) is usually employed as a "gatekeeper" in order to provide access to a secure system. These systems operate with the user's knowledge and typically require the user's cooperation. The developed system uses the MATLAB.
This paper describes the derivation of a sequence kernel that transforms speech utterances into probabilistic vectors for classification in an expanded feature space. The sequence kernel is built upon a set of Gaussian basis functions, where half of the basis functions contain speaker specific information while the other half implicates the common characteristics of the competing background speakers. The idea is similar to that in the Gaussian mixture modeluniversal background model (GMM-UBM) system, except that the Gaussian densities are treated individually in our proposed sequence kernel, as opposed to two mixtures of Gaussian densities in the GMM-UBM system. The motivation is to exploit the individual Gaussian components for better speaker discrimination. Experiments on NIST 2001 SRE corpus show convincing results for the probabilistic sequence kernel approach.
This paper provides an overview of Gaussian Mixture Model (GMM) and its component of speech signal. During the earlier period it has been revealed that Gaussian Mixture Model is very much appropriate for voice modeling in speaker recognition system. For Speaker recognition, Gaussian mixture model is an essential appliance of statistical clustering. The task effortlessly performed by humans is not effortless for machine or computers such as voice recognition or face recognition so for this function speaker recognition technology makes available a solution, using this technology the computers/machines outperforms than humans.
This paper presents the performance of a text independent speaker verification system using Gaussian Mixture Model (GMM) for the Brazilian Portuguese. The Gaussian compo-nents of the GMM statistically represent the spectral char-acteristics of the speaker, leading to an effective speaker recognition system. The main goal here is a detailed evalua-tion of the parameters used by the GMM such as the number of Gaussian mixtures, the amount of time for training and testing. Aiming at the definition of the best set of features for a reasonable response, this work helps the comprehen-sion of the model and gives insights for further investiga-tion. We have used 36 speakers in the experiments, all mod-eled with 15 mel-cepstral coefficients. For 32 Gaussians, 60 seconds of training, and 30 seconds of testing, the sys-tem has no failure for a reasonably clean speech signal. The results have shown that the higher the amount of time for training and testing, the better are the results for a give...
Training of high order Gaussian mixture models (GMM) on large dataset in one stage requires considerable amount of processing power and storage requirement which may not be either feasible or available in many cases. While training of such GMMs in several stages reduces the computational and memory costs; this normally results in a sub-optimum GMM compared to the one which entirely is trained in a single stage. In this paper a new method for optimization of the multi-stage trained GMMs is proposed in the context of speaker verification framework. Experimental results show that the optimized GMMs trained by incorporation of the proposed algorithm improves the performance of the GMM based speaker verification system.
2007
In this paper we investigate three approaches of calibrating and fusing output scores for speaker verification. Today's speaker recognition systems often consist of several subsystems that use different generative and discriminative classifiers. If subsystems like Gaussian Mixture Models (GMMs) and Support Vector Machines (SVMs) are used to obtain a final score for decision a probabilistic calibration of single classifier scores is important. Experiments on the NIST 2006 evaluation dataset show a performance improvement compared to the single subsystems and the standard un-calibrated fusion methods.
IEEE Transactions on Speech and Audio Processing, 2005
This paper presents a text-independent speaker verification system using support vector machines (SVMs) with score-space kernels. Score-space kernels generalize Fisher kernels and are based on underlying generative models such as Gaussian mixture models (GMMs). This approach provides direct discrimination between whole sequences, in contrast with the frame-level approaches at the heart of most current systems. The resultant SVMs have a very high dimensionality since it is related to the number of parameters in the underlying generative model. To address problems that arise in the resultant optimization we introduce a technique called spherical normalization that preconditions the Hessian matrix. We have performed speaker verification experiments using the PolyVar database. The SVM system presented here reduces the relative error rates by 34% compared to a GMM likelihood ratio system. Index Terms-Fisher kernel, score-space kernel, speaker verification, support vector machine. Vincent Wan received a BA in
and speaker verification applications. This paper presents a study of the model parameters effects in a state-of-the-art adapted GMM based text-independent speaker verification system. The system is benefited from likelihood ratio test for verification, using adapted GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. Fast scoring and normalization of scores was used which is a very important criterion to deal with real-world data. The system performance was evaluated using the detection error trade-off (DET) curves and decision cost function (DCF). The effects of model order, the training and test speeches lengths were studied experimentally.
Bio-Inspired Applications of Connectionism
Speaker recognition is a term which is most popular in biometric recognition technique that tends to identify and verify a speaker from his/her speech data. Speaker recognition system uses mechanism to recognize the speaker by using the speaker's speech signal. It is mainly useful in applications where security is the main and important one. Generally, speech information are recorded though the air microphone and these speech information collected from various speakers are used as input for the speaker recognition system as they are prone to environmental background noise, the performance is enhanced by integrating an additional speech signal collected through a throat microphone along with speech signal collected from standard air microphone. The resulting signal is very similar to normal speech, and is not affected by environmental background noise. This paper is mainly focused on extraction of the Mel frequency Cepstral Coefficients (MFCC) feature from an air speech signal and throat speech signal to built Gaussian Mixture Model(GMM) based closed-set text independent speaker recognition systems and to depict the result based on identification.
2011 International Conference on Multimedia Computing and Systems, 2011
Gaussian mixture models (GMM) have been widely and successfully used in speaker recognition during the last decades. They are generally trained using the generative criterion of maximum likelihood estimation. In an earlier work, we proposed an algorithm for discriminative training of GMM with diagonal covariances under a large margin criterion. In this paper, we present a new version of this algorithm which has the major advantage of being computationally highly efficient. The resulting algorithm is thus well suited to handle large scale databases. To show the effectiveness of the new algorithm, we carry out a full NIST speaker verification task using NIST-SRE'2006 data. The results show that our system outperforms the baseline GMM, and with high computational efficiency.
2000
In this paper the performance of the support vector machine (SVM) on a speaker verification task is assessed. Since speaker verification requires binary decisions, support vector machines seem to be a promising candidate to perform the task. A new technique for normalising the polynomial kernel is developed and used to achieve performance comparable to other classifiers on the YOHO database. We also present results on a speaker identification task.
Speech Communication, 1995
Gaussian Mixture Models (GMMs) have been successfully applied to the tasks of speaker ID and verification when a large amount of enrolment data is available to characterize client speakers ([1],[10], ). However, there are many applications where it is unreasonable to expect clients to spend this much time training the system. Thus, we have been exploring the performance of various methods when only a sparse amount of enrolment data is available. Under such conditions, the performance of GMMs deteriorates drastically. A possible solution is the "eigenvoice" approach, in which client and test speaker models are confined to a low-dimensional linear subspace obtained previously from a different set of training data. One advantage of the approach is that it does away with the need for impostor models for speaker verification.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.