Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007
Kernel logistic regression (KLR) is a popular non-linear classification technique. Unlike an empirical risk minimization approach such as employed by Support Vector Machines (SVMs), KLR yields probabilistic outcomes based on a maximum likelihood argument which are particularly important in speech recognition. Different from other KLR implementations we use a Nyström approximation to solve large scale problems with estimation in the primal space such as done in fixed-size Least Squares Support Vector Machines (LS-SVMs). In the speech experiments it is investigated how a natural KLR extension to multi-class classification compares to binary KLR models coupled via a one-versus-one coding scheme. Moreover, a comparison to SVMs is made.
2005
Logistic Regression is a well known classification method in the field of statistical learning. Recently, a kernelised version of logistic regression has become very popular, that allows nonlinear probabilistic classification and shows promising results on several benchmarks. In this paper we show that kernel logistic regression (KLO-GREG) and especially its sparse extensions are useful alternatives to standard Gaussian mixture models or Support Vector Machines (SVM) in the classification of speech samples. Results on three small speech recognition datasets are given and compared to SVMs and Gaussian mixture models. While the classification results of the KLOGREG are similar to the results of SVMs, we show that the sparse KLOGREG versions produce highly sparse models. Unlike SVMs the KLOGREG can naturally be generalised to multi-class problems and provides an estimate of the conditional probability of class membership. So it is possible to quantify confidence levels for class assignments and to use KLO-GREG in continuous speech recognition.
2009
The robustness of phoneme classification to white Gaussian noise and pink noise in the acoustic waveform domain is investigated using support vector machines. We focus on the problem of designing kernels which are tuned to the physical properties of speech. For comparison, results are reported for the PLP representation of speech using standard kernels. We show that major improvements can be achieved by incorporating the properties of speech into kernels. Furthermore, the high-dimensional acoustic waveforms exhibit more robust behavior to additive noise. Finally, we investigate a combination of the PLP and acoustic waveform representations which attains better classification than either of the individual representations over a range of noise levels.
2007 International Joint Conference on Neural Networks, 2007
This research studies a practical iterative algorithm for multi-class kernel logistic regression (KLR). Starting from the negative penalized log likelihood criterium we show that the optimization problem in each iteration can be solved by a weighted version of Least Squares Support Vector Machines (LS-SVMs). In this derivation it turns out that the global regularization term is reflected as a usual regularization in each separate step. In the LS-SVM framework, fixed-size LS-SVM is known to perform well on large data sets. We therefore implement this model to solve large scale multi-class KLR problems with estimation in the primal space. To reduce the size of the Hessian, an alternating descent version of Newton's method is used which has the extra advantage that it can be easily used in a distributed computing environment. It is investigated how a multi-class kernel logistic regression model compares to a one-versus-all coding scheme.
2006 IEEE Odyssey - The Speaker and Language Recognition Workshop, 2006
Logistic Regression is a well known classification method in the field of statistical learning. Recently, a kernelized version of logistic regression has become very popular, because it allows non-linear probabilistic classification and shows promising results on several benchmark problems. In this paper we show that kernel logistic regression (KLR) and especially its sparse extensions (SKLR) are useful alternatives to standard Gaussian mixture models (GMMs) and Support Vector Machines (SVMs) in Speaker Recognition. While the classification results of KLR and SKLR are similar to the results of SVMs, we show that SKLR produces highly sparse models. Unlike SVMs the kernel logistic regression also provides an estimate of the conditional probability of class membership. In speaker identification experiments the SKLR methods outperform the SVM and the GMM baseline system on the POLY-COST database.
Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02., 2002
2006
For the classical statistical classification algorithms the probability distribution models are known. However, in many real life applications, such as speech recognition, there is not enough information about the probability distribution function. This is a very common scenario and poses a very serious restriction in classification. Support Vector Machines (SVMs) can help in such situations because they are distribution free algorithms that originated from statistical learning theory and Structural Risk Minimization (SRM). In the most basic approach SVMs use linearly separating Hyperplanes to create classification with maximal margins. However in application, the classification problem requires a constrained nonlinear approach to be taken during the learning stages, and a quadratic problem has to be solved. For the case where the classes cannot be linearly separable due to overlap, the SVM algorithm will transform the original input space into a higher dimensional feature space, wh...
IEEE Transactions on Audio, Speech, and Language Processing, 2000
This work proposes methods for combining cepstral and acoustic waveform representations for a front-end of support vector machine (SVM) based speech recognition systems that are robust to additive noise. The key issue of kernel design and noise adaptation for the acoustic waveform representation is addressed first. Cepstral and acoustic waveform representations are then compared on a phoneme classification task. Experiments show that the cepstral features achieve very good performance in low noise conditions, but suffer severe performance degradation already at moderate noise levels. Classification in the acoustic waveform domain, on the other hand, is less accurate in low noise but exhibits a more robust behavior in high noise conditions. A combination of the cepstral and acoustic waveform representations achieves better classification performance than either of the individual representations over the entire range of noise levels tested, down to −18dB SNR.
2005
In this paper we use kernel-based Fisher Discriminants (KFD) for classification by integrating this method in a HMM-based speech recognition system. We translate the outputs of the KFD-classifier into conditional probabilities and use them as production probabilities of a HMM-based decoder for speech recognition. To obtain a good performance also in terms of computational complexity the Recursive Least Squares Algorithm (RLS-Algorithm) is enforced. We train and test the described hybrid structure on the Resource Management Corpus (RM1).
Interspeech 2009
This work focuses on the robustness of phoneme classification to additive noise in the acoustic waveform domain using support vector machines (SVMs). We address the issue of designing kernels for acoustic waveforms which imitate the state-ofthe-art representations such as PLP and MFCC and are tuned to the physical properties of speech. For comparison, classification results in the PLP representation domain with cepstral mean-and-variance normalization (CMVN) using standard kernels are also reported. It is shown that our custom-designed kernels achieve better classification performance at high noise. Finally, we combine the PLP and acoustic waveform representations to attain better classification than either of the individual representations over the entire range of noise levels tested, from quiet condition up to −18dB SNR.
2000
We describe a new method for phoneme sequence recognition given a speech utterance. In contrast to HMM-based approaches, our method uses a kernel-based discriminative training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the correct sequence. The phoneme sequence predictor is devised by mapping the speech utterance along with a proposed phoneme sequence to a vector-space endowed with an inner-product that is realized by a Mercer kernel. Building on large margin techniques for predicting whole sequences, we are able to devise a learning algorithm which distills to separating the correct phoneme sequence from all other sequences. We describe an iterative algorithm for learning the phoneme sequence recognizer and further describe an efficient implementation of it. We present initial encouraging experimental results with the TIMIT and compare the proposed method to an HMM-based approach.
2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, 2006
In this paper, we examine the problem of kernel selection for oneversus-all (OVA) classification of multiclass data with support vector machines (SVMs). We focus specifically on the problem of training what we refer to as generalized linear kernels-that is, kernels of the form, ´Ü½ ܾµ Ü Ì ½ Êܾ, where Ê is a positive semidefinite matrix. Our approach for training ´Ü½ ܾµ involves first constructing a set of upper bounds on the rates of false positives and false negatives at a given score threshold. Under various conditions, minimizing these bounds leads to the closed-form solution, Ê Ï ½ , where Ï is the expected within-class covariance matrix of the data. We tested various parameterizations of Ê, including a diagonal parameterization that simply performs perfeature variance normalization, on the 1-conversation training condition of the SRE-2003 and SRE-2004 speaker recognition tasks. In experiments on a state-of-the-art MLLR-SVM speaker recognition system [1], the parameterization, Ê Ï ½ × , where Ï× is a smoothed estimate of Ï, achieves relative reductions in the minimum decision cost function (DCF) [2] of up to 22% below the results obtained when Ê does per-feature variance normalization.
We consider pattern classification using a weighted sum of normalized kernel functions. Such schemes can be viewed as estimates of class a posteriori probabilities. We apply this regression method successfully to two real life pattern recognition problems.
Intelligenza Artificiale, 2017
Expressive but complex kernel functions, such as Sequence or Tree kernels, are usually underemployed in NLP tasks as for their significant computational cost in both learning and classification stages. Recently, the Nyström methodology for data embedding has been proposed as a viable solution to scalability problems. It improves scalability of learning processes acting over highly structured data, by mapping data into low-dimensional compact linear representations of kernel spaces. In this paper, a stratification of the model corresponding to the embedding space is proposed as a further highly flexible optimization. Nyström embedding spaces of increasing sizes are combined in an efficient ensemble strategy: upper layers, providing higher dimensional representations, are invoked on input instances only when the adoption of smaller (i.e., less expressive) embeddings provides uncertain outcomes. Experimental results using different models of such an uncertainty show that state-of-the-art accuracy on three semantic inference tasks can be obtained even when one order of magnitude fewer kernel computations is carried out.
The least-squares probabilistic classifier (LSPC) is a computationally efficient alternative to kernel logistic regression (KLR). A key idea for the speedup is that, unlike KLR that uses maximum likelihood estimation for a log-linear model, LSPC uses least-squares estimation for a linear model. This allows us to obtain a global solution analytically in a classwise manner. In exchange for the speedup, however, this linear least-squares formulation does not necessarily produce a non-negative estimate. Nevertheless, consistency of LSPC is guaranteed in the large sample limit, and rounding up a negative estimate to zero in finite sample cases was demonstrated not to degrade the classification performance in experiments. Thus, LSPC is a practically useful probabilistic classifier. In this paper, we give an overview of LSPC and its extentions to covariate shift, multi-task, and multi-label scenarios. A MATLAB implementation of LSPC is available from '
Logistic regression is a linear binary classification algorithm frequently used for classification problems. In this paper we present its kernel version which is used for classification of non-linearly separable problems. We briefly introduce the concept of multiple kernel learning and apply it to kernel logistic regression. We elaborate the performance differences between classical, kernel logistic regression and its stochas-tic variant (both classical and kernel logistic regression).
Lecture Notes in Computer Science, 2001
Logistic regression is presumably the most popular representative of probabilistic discriminative classifiers. In this paper, a kernel variant of logistic regression is introduced as an iteratively re-weighted least-squares algorithm in kernel-induced feature spaces. This formulation allows us to apply highly efficient approximation methods that are capable of dealing with large-scale problems. For multi-class problems, a pairwise coupling procedure is proposed. Pairwise coupling for "kernelized" logistic regression effectively overcomes conceptual and numerical problems of standard multi-class kernel classifiers.
1999
Support Vector Machines (SVMs) represent a new approach to pattern classification which has recently attracted a great deal of interest in the machine learning community. Their appeal lies in their strong connection to the underlying statistical learning theory, in particular the theory of Structural Risk Minimization. SVMs have been shown to be particularly successful in fields such as image identification and face recognition; in many problems SVM classifiers have been shown to perform much better than other nonlinear classifiers such as artificial neural networks and ¡ -nearest neighbors.
2005
While the temporal dynamic of speech can be handled very efficiently by Hidden Markov Models (HMMs), the classification of the single speech units (phonemes) is usually done with Gaussian probability density functions which are not discriminative. In this paper we use the Kernel Fisher Discriminant (KFD) for classification by integrating this method in a HMM-based speech recognition system. In this structure we translate the outputs of the KFD into class-conditional probabilities and use them as production probabilities in an HMM-based speech decoder. The KFD has already shown good classification results in other fields (e. g. pattern recognition). To obtain good performance also in terms of computational complexity the KFD is implemented iteratively with a sparse greedy approach. We train and test the described hybrid structure on the Resource Management (RM1) task.
2006 14th European Signal Processing Conference, 2006
The robustness of phoneme recognition using support vector machines to additive noise is investigated for three kinds of speech representation. The representations considered are PLP, PLP with RASTA processing, and a high-dimensional principal component approximation of acoustic waveforms. While the classification in the PLP and PLP/RASTA domains attains superb accuracy on clean data, the classification in the high-dimensional space proves to be much more robust to additive noise.
Proceedings of the 8th …, 2009
In this paper, we are proposing a new classifier called support vector machines to classify the speech signals. We have achieved very good generalization performance by implementing the support vector machines with various kernel functions. The use One-vs-One Classifier with voting algorithm improves speech signal classification systems efficiency.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.