Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007, International Journal of the Book
This paper studies the application of automatic phoneme classification to the computer-aided training of the speech and hearing handicapped. In particular, we focus on how efficiently discriminant analysis can reduce the number of features and increase classification performance. A nonlinear counterpart of Linear Discriminant Analysis, which is a general purpose class specific feature extractor, is presented where the nonlinearization is carried out by employing the so-called 'kernel-idea'. Then, we examine how this nonlinear extraction technique affects the efficiency of learning algorithms such as Artificial Neural Network and Support Vector Machines.
2008
Speaker independent feature extraction is a critical problem in speech recognition. Oriented principal component analysis (OPCA) is a potential solution that can find a subspace robust against noise of the data set. The objective of this paper is to find a speaker-independent subspace by generalizing OPCA in two steps: First, we find a nonlinear subspace with the help of a kernel trick, which we refer to as kernel OPCA. Second, we generalize OPCA to problems with more than two phonemes, which leads to oriented discriminant analysis (ODA). In addition, we equip ODA with the kernel trick again, which we refer to as kernel ODA. The models are tested on the CMU ARCTIC speech database. Our results indicate that our proposed kernel methods can outperform linear OPCA and linear ODA at finding a speaker-independent phoneme space.
IJCSIT, 2019
Speech recognition techniques are one of the most important modern technologies. Many different systems have been developed in terms of methods used in the extraction of features and methods of classification. Voice recognition includes two areas: speech recognition and speaker recognition, where the research is confined to the field of speech recognition. The research presents a proposal to improve the performance of single word recognition systems by an algorithm that combines more than one of the techniques used in character extraction and modulation of the neural network to study the effects of recognition science and study the effect of noise on the proposed system. In this research four systems of speech recognition were studied, the first system adopted the MFCC algorithm to extract the features. The second system adopted the PLP algorithm, while the third system was based on combining the two previous algorithms in addition to the zero-passing rate. In the fourth system, the neural network used in the differentiation process was modified and the error ratio was determined. The impact of noise on these previous systems. The outcomes were looked at regarding the rate of recognizable proof and the season of preparing the neural network for every system independently, to get a rate of distinguishing proof and quiet up to 98% utilizing the proposed framework.
Phoneme classification is investigated for linear feature domains with the aim of improving robustness to additive noise. In linear feature domains noise adaptation is exact, potentially leading to more accurate classification than representations involving non-linear processing and dimensionality reduction. A generative framework is developed for isolated phoneme classification using linear features. Initial results are shown for representations consisting of concatenated frames from the centre of the phoneme, each containing f frames. As phonemes have variable duration, no single f is optimal for all phonemes, therefore an average is taken over models with a range of values of f. Results are further improved by including information from the entire phoneme and transitions. In the presence of additive noise, classification in this framework performs better than an analogous PLP classifier, adapted to noise using cepstral mean and variance normalisation, below 18dB SNR. Finally we propose classification using a combination of acoustic waveform and PLP log-likelihoods. The combined classifier performs uniformly better than either of the individual classifiers across all noise levels.
Lecture Notes in Computer Science, 2001
This paper examines the applicability of some learning techniques to the classification of phonemes. The methods tested were artificial neural nets (ANN), support vector machines (SVM) and Gaussian mixture modeling. We compare these methods with a traditional hidden Markov phoneme model (HMM) working with the linear prediction-based cepstral coefficient features (LPCC). We also tried to combine the learners with feature transformation methods, like linear discriminant analysis (LDA), principal component analysis (PCA) and independent component analysis (ICA). We found that the discriminative learners can attain the efficiency of the HMM, and after LDA they can attain practically the same score on only 27 features. PCA and ICA proved ineffective, apparently because of the discrete cosine transform inherent in LPCC.
Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02., 2002
2007
Kernel logistic regression (KLR) is a popular non-linear classification technique. Unlike an empirical risk minimization approach such as employed by Support Vector Machines (SVMs), KLR yields probabilistic outcomes based on a maximum likelihood argument which are particularly important in speech recognition. Different from other KLR implementations we use a Nyström approximation to solve large scale problems with estimation in the primal space such as done in fixed-size Least Squares Support Vector Machines (LS-SVMs). In the speech experiments it is investigated how a natural KLR extension to multi-class classification compares to binary KLR models coupled via a one-versus-one coding scheme. Moreover, a comparison to SVMs is made.
1999
Support Vector Machines (SVMs) represent a new approach to pattern classification which has recently attracted a great deal of interest in the machine learning community. Their appeal lies in their strong connection to the underlying statistical learning theory, in particular the theory of Structural Risk Minimization. SVMs have been shown to be particularly successful in fields such as image identification and face recognition; in many problems SVM classifiers have been shown to perform much better than other nonlinear classifiers such as artificial neural networks and ¡ -nearest neighbors.
2006 14th European Signal Processing Conference, 2006
The robustness of phoneme recognition using support vector machines to additive noise is investigated for three kinds of speech representation. The representations considered are PLP, PLP with RASTA processing, and a high-dimensional principal component approximation of acoustic waveforms. While the classification in the PLP and PLP/RASTA domains attains superb accuracy on clean data, the classification in the high-dimensional space proves to be much more robust to additive noise.
2006
We describe a new method for phoneme sequence recognition given a speech utterance, which is not based on the HMM. In contrast to HMM-based approaches, our method uses a discriminative kernel-based training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the correct sequence. The phoneme sequence predictor is devised by mapping the speech utterance along with a proposed phoneme sequence to a vectorspace endowed with an inner-product that is realized by a Mercer kernel. Building on large margin techniques for predicting whole sequences, we are able to devise a learning algorithm which distills to separating the correct phoneme sequence from all other sequences. We describe an iterative algorithm for learning the phoneme sequence recognizer and further describe an efficient implementation of it. We present initial encouraging experimental results with the TIMIT and compare the proposed method to an HMM-based approach.
2006
For the classical statistical classification algorithms the probability distribution models are known. However, in many real life applications, such as speech recognition, there is not enough information about the probability distribution function. This is a very common scenario and poses a very serious restriction in classification. Support Vector Machines (SVMs) can help in such situations because they are distribution free algorithms that originated from statistical learning theory and Structural Risk Minimization (SRM). In the most basic approach SVMs use linearly separating Hyperplanes to create classification with maximal margins. However in application, the classification problem requires a constrained nonlinear approach to be taken during the learning stages, and a quadratic problem has to be solved. For the case where the classes cannot be linearly separable due to overlap, the SVM algorithm will transform the original input space into a higher dimensional feature space, wh...
Linear Discriminative Analysis techniques have been used in pattern recognition to map feature vectors to achieve op- timal classification. Kernel Discriminative Analysis(KDA) seeks to introduce non-linearity in this approach by map- ping the features to a non-linear space before applying LDA analysis. The formulation is expressed as an eigen- value problem resolution. Using a different kernel, one can cover a wide class of nonlinearities. In this paper, we de- scribe this technique and present an application to a speech recognition problem. We give classification results for a connected digit recognition task and analyze some existing problems.
2009
The robustness of phoneme classification to white Gaussian noise and pink noise in the acoustic waveform domain is investigated using support vector machines. We focus on the problem of designing kernels which are tuned to the physical properties of speech. For comparison, results are reported for the PLP representation of speech using standard kernels. We show that major improvements can be achieved by incorporating the properties of speech into kernels. Furthermore, the high-dimensional acoustic waveforms exhibit more robust behavior to additive noise. Finally, we investigate a combination of the PLP and acoustic waveform representations which attains better classification than either of the individual representations over a range of noise levels.
2000
We describe a new method for phoneme sequence recognition given a speech utterance. In contrast to HMM-based approaches, our method uses a kernel-based discriminative training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the correct sequence. The phoneme sequence predictor is devised by mapping the speech utterance along with a proposed phoneme sequence to a vector-space endowed with an inner-product that is realized by a Mercer kernel. Building on large margin techniques for predicting whole sequences, we are able to devise a learning algorithm which distills to separating the correct phoneme sequence from all other sequences. We describe an iterative algorithm for learning the phoneme sequence recognizer and further describe an efficient implementation of it. We present initial encouraging experimental results with the TIMIT and compare the proposed method to an HMM-based approach.
2001
The aim of discriminant feature analysis techniques in the signal processing of speech recognition systems is to find a feature vector transformation which maps a high dimensional input vector onto a low dimensional vector while retaining a maximum amount of information in the feature vector to discriminate between predefined classes. This paper points out the significance of the defini- tion of the classes in the discriminant feature analysis technique. Three choices for the definition of the clas- ses are investigated: the phonemes, the states in context independent acoustic models and the tied states in context dependent acoustic models. These choices for the classes were applied to (1) stan- dard LDA (linear discriminant analysis) for reference and to (2) MIDA, an improved, mutual information based dis- criminant analysis technique. Evaluation of the resulting linear feature transforms on a large vocabulary contin- uous speech recognition task shows, depending on the technique, ...
Phoneme classification is investigated for linear feature domains with the aim of improving robustness to additive noise. In linear feature domains noise adaptation is exact, potentially leading to more accurate classification than representations involving non-linear processing and dimensionality reduction. A generative framework is developed for isolated phoneme classification using linear features. Initial results are shown for representations consisting of concatenated frames from the centre of the phoneme, each containing f frames. As phonemes have variable duration, no single f is optimal for all phonemes, therefore an average is taken over models with a range of values of f . Results are further improved by including information from the entire phoneme and transitions. In the presence of additive noise, classification in this framework performs better than an analogous PLP classifier, adapted to noise using cepstral mean and variance normalisation, below 18dB SNR. Finally we propose classification using a combination of acoustic waveform and PLP log-likelihoods. The combined classifier performs uniformly better than either of the individual classifiers across all noise levels.
IEEE Transactions on Audio, Speech, and Language Processing, 2000
This work proposes methods for combining cepstral and acoustic waveform representations for a front-end of support vector machine (SVM) based speech recognition systems that are robust to additive noise. The key issue of kernel design and noise adaptation for the acoustic waveform representation is addressed first. Cepstral and acoustic waveform representations are then compared on a phoneme classification task. Experiments show that the cepstral features achieve very good performance in low noise conditions, but suffer severe performance degradation already at moderate noise levels. Classification in the acoustic waveform domain, on the other hand, is less accurate in low noise but exhibits a more robust behavior in high noise conditions. A combination of the cepstral and acoustic waveform representations achieves better classification performance than either of the individual representations over the entire range of noise levels tested, down to −18dB SNR.
2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012
Modeling the second-order statistics of articulatory trajectories is likely to improve the performance in classifying phone segments compared to using only linear combinations of MFCCs. Nevertheless, the extremely high dimensionality of the feature space spanned by a combination of monomials of degree-1 and degree-2 makes it difficult to effectively exploit the discriminative information in the full covariance matrix. This paper proposes a novel algorithm, dubbed Knowledge-based Quadratic Discriminant Analysis (KnQDA), for reducing the number of dimensions of the space spanned by degree-1 and degree-2 monomials by using phonetic knowledge for selecting the set of degree-2 monomials that are most likely to improve classification. KnQDA seeks a trade-off between overfitting and undertraining, which further improves the learnability. Binary classifications on all pairs of phones in TIMIT show the effectiveness of the proposed method, especially on those phone pairs that overlap strongly in the linear feature space.
2005
In this paper we use kernel-based Fisher Discriminants (KFD) for classification by integrating this method in a HMM-based speech recognition system. We translate the outputs of the KFD-classifier into conditional probabilities and use them as production probabilities of a HMM-based decoder for speech recognition. To obtain a good performance also in terms of computational complexity the Recursive Least Squares Algorithm (RLS-Algorithm) is enforced. We train and test the described hybrid structure on the Resource Management Corpus (RM1).
2003
This work discusses the improvements which can be expected when applying linear feature-space transformations based on Linear Discriminant Analysis (LDA) within automatic speechrecognition (ASR). It is shown that different factors influence the effectiveness of LDA-transformations. Most importantly, increasing the number of LDA-classes by using time-aligned states of Hidden-Markov-Models instead of phonemes is necessary to obtain improvements predictably. An extension of LDA is presented, which utilises the elementary Gaussian components of the mixture probability-density functions of the Hidden-Markov-Models' states to define actual Gaussian LDAclasses. Experimental results on the TIMIT and WSJCAM0 recognition task are given, where relative improvements of the error-rate of 3.2% and 3.9%, respectively, were obtained.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.