Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005
…
4 pages
1 file
In this paper we use kernel-based Fisher Discriminants (KFD) for classification by integrating this method in a HMM-based speech recognition system. We translate the outputs of the KFD-classifier into conditional probabilities and use them as production probabilities of a HMM-based decoder for speech recognition. To obtain a good performance also in terms of computational complexity the Recursive Least Squares Algorithm (RLS-Algorithm) is enforced. We train and test the described hybrid structure on the Resource Management Corpus (RM1).
2005
While the temporal dynamic of speech can be handled very efficiently by Hidden Markov Models (HMMs), the classification of the single speech units (phonemes) is usually done with Gaussian probability density functions which are not discriminative. In this paper we use the Kernel Fisher Discriminant (KFD) for classification by integrating this method in a HMM-based speech recognition system. In this structure we translate the outputs of the KFD into class-conditional probabilities and use them as production probabilities in an HMM-based speech decoder. The KFD has already shown good classification results in other fields (e. g. pattern recognition). To obtain good performance also in terms of computational complexity the KFD is implemented iteratively with a sparse greedy approach. We train and test the described hybrid structure on the Resource Management (RM1) task.
IEEE Signal Processing Letters, 2014
In this letter, we propose a new acoustic modelling approach for automatic speech recognition based on probabilistic linear discriminant analysis (PLDA), which is used to model the state density function for the standard hidden Markov models (HMMs). Unlike the conventional Gaussian mixture models (GMMs) where the correlations are weakly modelled by using the diagonal covariance matrices, PLDA captures the correlations of feature vector in subspaces without vastly expanding the model. It also allows the usage of high dimensional feature input, and therefore is more flexible to make use of different type of acoustic features. We performed the preliminary experiments on the Switchboard corpus, and demonstrated the feasibility of this acoustic model.
Linear Discriminative Analysis techniques have been used in pattern recognition to map feature vectors to achieve op- timal classification. Kernel Discriminative Analysis(KDA) seeks to introduce non-linearity in this approach by map- ping the features to a non-linear space before applying LDA analysis. The formulation is expressed as an eigen- value problem resolution. Using a different kernel, one can cover a wide class of nonlinearities. In this paper, we de- scribe this technique and present an application to a speech recognition problem. We give classification results for a connected digit recognition task and analyze some existing problems.
2000
We describe a new method for phoneme sequence recognition given a speech utterance. In contrast to HMM-based approaches, our method uses a kernel-based discriminative training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the correct sequence. The phoneme sequence predictor is devised by mapping the speech utterance along with a proposed phoneme sequence to a vector-space endowed with an inner-product that is realized by a Mercer kernel. Building on large margin techniques for predicting whole sequences, we are able to devise a learning algorithm which distills to separating the correct phoneme sequence from all other sequences. We describe an iterative algorithm for learning the phoneme sequence recognizer and further describe an efficient implementation of it. We present initial encouraging experimental results with the TIMIT and compare the proposed method to an HMM-based approach.
Computer Speech & Language, 2010
Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. This paper describes a scheme for overcoming this problem for speech recognition in noise by adapting the kernel rather than the SVM decision boundary. Generative kernels, defined using generative models, are one type of kernel that allows SVMs to handle sequence data. By compensating the parameters of the generative models for each noise condition noise-specific generative kernels can be obtained. These can be used to train a noise-independent SVM on a range of noise conditions, which can then be used with a test-set noise kernel for classification. The noise-specific kernels used in this paper are based on Vector Taylor Series (VTS) model-based compensation. VTS allows all the model parameters to be compensated and the background noise to be estimated in a maximum likelihood fashion. A brief discussion of VTS and the optimisation of the mismatch function representing the impact of noise on the clean speech, is also included. Experiments using these VTS-based test-set noise kernels were run on the AURORA 2 continuous digit task. The proposed SVM rescoring scheme yields large gains in performance over the VTS compensated models.
2007 Information Theory and Applications Workshop, 2007
The vast majority of automatic speech recognition systems use Hidden Markov Models (HMMs) as the underlying acoustic model. Initially these models were trained based on the maximum likelihood criterion. Significant performance gains have been obtained by using discriminative training criteria, such as maximum mutual information and minimum phone error. However, the underlying acoustic model is still generative, with the associated constraints on the state and transition probability distributions, and classification is based on Bayes' decision rule. Recently, there has been interest in examining discriminative, or direct, models for speech recognition. This paper briefly reviews the forms of discriminative models that have been investigated. These include maximum entropy Markov models, hidden conditional random fields and conditional augmented models. The relationships between the various models and issues with applying them to large vocabulary continuous speech recognition will be discussed.
IEEE Journal of Selected Topics in Signal Processing, 2000
This paper describes a novel classifier for sequential data based on nonlinear classification derived from kernel methods. In the proposed method, kernel methods are used for enhancing the emission probability density functions (pdfs) of hidden Markov models (HMMs). Because the emission pdfs enhanced by kernel methods have sufficient nonlinear classification performance, mixture models such as Gaussian mixture models (GMMs), which might cause problems of overfitting and local optima, are not necessary in the proposed method. Unlike the methods used in earlier studies on sequential pattern classification using kernel methods, our method can be regarded as an extension of conventional HMMs, and therefore, it can completely model the transition of hidden states with the observed vectors. Therefore, our method can be applied to many applications developed with conventional HMMs, especially for speech recognition. In this paper, we carried out an isolated phoneme classification as a preliminary experiment in order to evaluate the efficiency of the proposed sequential pattern classifier. We confirmed that the proposed method achieved steady improvements as compared to conventional HMMs with Gaussian-mixture emission pdfs trained by the maximum likelihood and the maximum mutual information procedures.
2009
In this paper we propose a back-off discriminative acoustic model for Automatic Speech Recognition (ASR). We use a set of broad phonetic classes to divide the classification problem originating from context-dependent modeling into a set of subproblems. By appropriately combining the scores from classifiers designed for the sub-problems, we can guarantee that the back-off acoustic score for different context-dependent units will be different. The back-off model can be combined with discriminative training algorithms to further improve the performance. Experimental results on a large vocabulary lecture transcription task show that the proposed back-off discriminative acoustic model has more than a 2.0% absolute word error rate reduction compared to clustering-based acoustic model. Index Terms: context-dependent acoustic modeling, back-off acoustic models, discriminative training,
2006
We describe a new method for phoneme sequence recognition given a speech utterance, which is not based on the HMM. In contrast to HMM-based approaches, our method uses a discriminative kernel-based training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the correct sequence. The phoneme sequence predictor is devised by mapping the speech utterance along with a proposed phoneme sequence to a vectorspace endowed with an inner-product that is realized by a Mercer kernel. Building on large margin techniques for predicting whole sequences, we are able to devise a learning algorithm which distills to separating the correct phoneme sequence from all other sequences. We describe an iterative algorithm for learning the phoneme sequence recognizer and further describe an efficient implementation of it. We present initial encouraging experimental results with the TIMIT and compare the proposed method to an HMM-based approach.
In this paper, we study discriminant function based minimum recognition error rate pattern recognition approach. This ap- proach departs from the conventional paradigm which links a classification/recognition task to the problem of distribution esti- mation. Instead, it takes a discriminant function based statistical pattern recognition approach and the goodness of this approach to classification error rate minimization is established through a special loss function. It is meaningful even when the model correctness assumption is known not valid. The use of discrimi- nant function has a significant impact on classifier design, since in many realistic applications, such as speech recognition, the true distribution form of the source is rarely known precisely and without model correctness assumption, the classical optimality theory of the distribution estimation approach can not be applied directly. We discuss issues in this new classifier design paradigm and present various extensions...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
IEEE Transactions on Audio, Speech and Language Processing, 2000
Neural Processing Letters
Proceedings of the IEEE, 2000
Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop, 1993
IEICE Transactions on Information and Systems, 2006
2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 2007
IEEE Signal Processing Letters, 2000
Elsevier, 2021
18th International Conference on Pattern Recognition (ICPR'06), 2006
Lecture Notes in Computer Science, 2007