Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2001
The aim of discriminant feature analysis techniques in the signal processing of speech recognition systems is to find a feature vector transformation which maps a high dimensional input vector onto a low dimensional vector while retaining a maximum amount of information in the feature vector to discriminate between predefined classes. This paper points out the significance of the defini- tion of the classes in the discriminant feature analysis technique. Three choices for the definition of the clas- ses are investigated: the phonemes, the states in context independent acoustic models and the tied states in context dependent acoustic models. These choices for the classes were applied to (1) stan- dard LDA (linear discriminant analysis) for reference and to (2) MIDA, an improved, mutual information based dis- criminant analysis technique. Evaluation of the resulting linear feature transforms on a large vocabulary contin- uous speech recognition task shows, depending on the technique, ...
2003
This work discusses the improvements which can be expected when applying linear feature-space transformations based on Linear Discriminant Analysis (LDA) within automatic speechrecognition (ASR). It is shown that different factors influence the effectiveness of LDA-transformations. Most importantly, increasing the number of LDA-classes by using time-aligned states of Hidden-Markov-Models instead of phonemes is necessary to obtain improvements predictably. An extension of LDA is presented, which utilises the elementary Gaussian components of the mixture probability-density functions of the Hidden-Markov-Models' states to define actual Gaussian LDAclasses. Experimental results on the TIMIT and WSJCAM0 recognition task are given, where relative improvements of the error-rate of 3.2% and 3.9%, respectively, were obtained.
Interspeech 2014, 2014
Linear discriminant analysis (LDA) is a simple and effective feature transformation technique that aims to improve discriminability by maximizing the ratio of the between-class variance to the within-class variance. However, LDA does not explicitly consider the sequential discriminative criterion which consists in directly reducing the errors of a speech recognizer. This paper proposes a simple extension of LDA that is called sequential LDA (sLDA) based on a sequential discriminative criterion computed from the Gaussian statistics, which are obtained from sequential maximum mutual information (MMI) training. Although the objective function of the proposed LDA can be regarded as a special case of various discriminative feature transformation techniques (for example, f-MPE or the bottom layer of a neural network), the transformation matrix can be obtained as the closed-form solution to a generalized eigenvalue problem, in contrast to the gradient-descent-based optimization methods usually used in these techniques. Experiments on large vocabulary continuous speech recognition (Corpus of Spontaneous Japanese) and noisy speech recognition task (2nd CHiME challenge) show consistent improvements from standard LDA due to the sequential discriminative training. In addition, the proposed method, despite its simple and fast computation, improved the performance in combination with discriminative feature transformation (f-bMMI), perhaps by providing a good initialization to f-bMMI.
IEEE Signal Processing Letters, 2014
In this letter, we propose a new acoustic modelling approach for automatic speech recognition based on probabilistic linear discriminant analysis (PLDA), which is used to model the state density function for the standard hidden Markov models (HMMs). Unlike the conventional Gaussian mixture models (GMMs) where the correlations are weakly modelled by using the diagonal covariance matrices, PLDA captures the correlations of feature vector in subspaces without vastly expanding the model. It also allows the usage of high dimensional feature input, and therefore is more flexible to make use of different type of acoustic features. We performed the preliminary experiments on the Switchboard corpus, and demonstrated the feasibility of this acoustic model.
2011
Abstract Feature extraction is an important component of pattern classification and speech recognition. Extracted features should discriminate classes from each other while being robust to environmental conditions such as noise. For this purpose, several feature transformations are proposed which can be divided into two main categories: data-dependent transformation and classifier-dependent transformation.
In this paper, we study discriminant function based minimum recognition error rate pattern recognition approach. This ap- proach departs from the conventional paradigm which links a classification/recognition task to the problem of distribution esti- mation. Instead, it takes a discriminant function based statistical pattern recognition approach and the goodness of this approach to classification error rate minimization is established through a special loss function. It is meaningful even when the model correctness assumption is known not valid. The use of discrimi- nant function has a significant impact on classifier design, since in many realistic applications, such as speech recognition, the true distribution form of the source is rarely known precisely and without model correctness assumption, the classical optimality theory of the distribution estimation approach can not be applied directly. We discuss issues in this new classifier design paradigm and present various extensions...
IEEE Signal Processing Letters, 2000
A speech model inspired by the signal subspace methods was recently proposed as a speech classifier with modest results. Fashioned along a "best representation" approach, the absence of valuable interclass information in the speech model, however, impairs the ability of the classifier to distinguish between phonetically alike classes. This letter proposes an improved classifier that implements interclass information. Specifically, a measure of the discriminative quality of individual class elements is defined and determined for all class elements. The discrimination measures thus obtained are subsequently applied in the classification procedure. Simulation results of the proposed signal subspace classifier in an isolated digit speech recognition problem reveal an improved performance over its predecessor.
Proceedings of the IEEE, 2000
In this paper, a discriminant-function-based minimum recognition error rate pattern-recognition approach is described and studied for various applications in speech processing. This approach departs from the conventional paradigm, which links a classification/recognition task to the problem of distribution estimation. Instead, it takes a discriminant-function-based statistical pattern recognition approach. The suitability of this approach for classification error rate minimization is established through a special loss function. It is meaningful even when the model correctness assumption is known to be not valid. We study the theoretical basis of this approach and compare it with various criteria used in speech recognition. We differentiate the method of classifier design by way of distribution estimation and the discriminant function methods of minimizing classification error rate based on the fact that in many realistic applications, such as speech recognition, the true distribution form of the source is rarely known precisely, and without model correctness assumption, the classical optimality theory of the distribution estimation approach cannot be applied directly. We discuss issues in this new classifier design paradigm and present various extensions of this approach to classifier design applications in speech processing.
2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), 2012
This paper presents a comparison of three techniques for dimensionally reduction in feature analysis for automatic speech recognition (ASR). All three approaches estimate a linear transformation that is applied to concatenated log spectral features and provide a mechanism for efficient modeling of spectral dynamics in ASR. The goal of the paper is to investigate the effectiveness of a discriminative approach for estimating these feature space transformations which is based on the assumption that speech features lie on a non-linear manifold. This approach is referred to as locality preserving discriminant analysis (LPDA) and is based on the principle of preserving local within-class relationships in this non-linear space while at the same time maximizing separability between classes. This approach was compared to two well known approaches for dimensionality reduction, linear discriminant analysis (LDA) and locality preserving linear projection (LPP), on the Aurora 2 speech in noise task. The LPDA approach was found to provide a significant reduction in WER with respect to the other techniques for most noise types and signal-to-noise ratios (SNRs).
Linear Discriminative Analysis techniques have been used in pattern recognition to map feature vectors to achieve op- timal classification. Kernel Discriminative Analysis(KDA) seeks to introduce non-linearity in this approach by map- ping the features to a non-linear space before applying LDA analysis. The formulation is expressed as an eigen- value problem resolution. Using a different kernel, one can cover a wide class of nonlinearities. In this paper, we de- scribe this technique and present an application to a speech recognition problem. We give classification results for a connected digit recognition task and analyze some existing problems.
Elsevier, 2021
Probabilistic linear discriminant analysis (PLDA) has achieved good performance in face recognition and speaker recognition. However, the computation of PLDA using the original formulation is inefficient when there are many training data, especially when the dimensionality of the data is high. Faced with this inefficiency issue, we propose scalable formulations for PLDA. The computation of PLDA using the scalable formulations is more efficient than using the original formulation when dealing with many training data. Using the scalable formulations, the PLDA model can significantly outperform other popular classifiers for speaker recognition, such as Support Vector Machine (SVM) and Gaussian Mixture Model (GMM). Besides of directly using PLDA as a classifier, we may also use PLDA as a feature transformation technique. This PLDA-based feature transformation technique can reduce or expand the original feature dimensionality, and at the same time keep the transformed feature vector approximately following the Gaussian distribution. Our experimental results on speaker recognition and acoustic scene classification demonstrate the effectiveness of applying PLDA for feature transformation. It is then promising to combine PLDA with other classification models for improved performance, extending the utility of PLDA to a wider range of areas.
2005
Feature extraction is an essential first step in speech recognition applications. In addition to static features extracted from each frame of speech data, it is beneficial to use dynamic features (called ∆ and ∆∆ coefficients) that use information from neighboring frames. Linear Discriminant Analysis (LDA) followed by a diagonalizing maximum likelihood linear transform (MLLT) applied to spliced static MFCC features yields important performance gains as compared to MFCC+∆+∆∆ features in most tasks. However, since LDA is obtained using statistical averages trained on limited data, it is reasonable to regularize LDA transform computation by using prior information and experience. In this paper, we regularize LDA and heteroschedastic LDA transforms using two methods: (1) Using statistical priors for the transform in a MAP formulation (2) Using structural constraints on the transform. As prior, we use a transform that computes static+∆+∆∆ coefficients. Our structural constraint is in the form of a block structured LDA transform where each block acts on the same cepstral parameters across frames. The second approach suggests using new coefficients for static, first difference and second difference operators as compared to the standard ones to improve performance. We test the new algorithms on two different tasks, namely TIMIT phone recognition and AURORA2 digit sequence recognition in noise. We obtain consistent improvement in our experiments as compared to MFCC features. In addition, we obtain encouraging results in some AURORA2 tests as compared to LDA+MLLT features.
2008
Linear discriminant analysis (LDA) is designed to seek a linear transformation that projects a data set into a lower-dimensional feature space for maximum class geometrical separability. LDA cannot always guarantee better classification accuracy, since its formulation is not in light of the properties of the classifiers, such as the automatic speech recognizer (ASR). In this paper, the relationship between the empirical classification error rates and the Mahalanobis distances of the respective class pairs of speech features is investigated, and based on this, a novel reformulation of the LDA criterion, distance-error coupled LDA (DE-LDA), is proposed. One notable characteristic of DE-LDA is that it can modulate the contribution on the between-class scatter from each class pair through the use of an empirical error function, while preserving the lightweight solvability of LDA. Experiment results seem to demonstrate that DE-LDA yields moderate improvements over LDA on the LVCSR task.
In this paper, Linear Discriminant Analysis (LDA) is investigated with respect to the combination of different acoustic features for automatic speech recognition. It is shown that the combination of acoustic features using LDA does not consistently lead to improve- ments in word error rate. A detailed analysis of the recognition results on the Verbmobil (VM II) and on the English portion of the European Parliament Plenary Sessions (EPPS) corpus is given. This includes an independent analysis of the effect of the dimen- sion of the input to LDA, the effect of strongly correlated input features, as well as a detailed numerical analysis of the generalized eigenvalue problem underlying LDA. Relative improvements in word error rate of up to 5% were observed for LDA-based com- bination of multiple acoustic features.
2008
Speaker independent feature extraction is a critical problem in speech recognition. Oriented principal component analysis (OPCA) is a potential solution that can find a subspace robust against noise of the data set. The objective of this paper is to find a speaker-independent subspace by generalizing OPCA in two steps: First, we find a nonlinear subspace with the help of a kernel trick, which we refer to as kernel OPCA. Second, we generalize OPCA to problems with more than two phonemes, which leads to oriented discriminant analysis (ODA). In addition, we equip ODA with the kernel trick again, which we refer to as kernel ODA. The models are tested on the CMU ARCTIC speech database. Our results indicate that our proposed kernel methods can outperform linear OPCA and linear ODA at finding a speaker-independent phoneme space.
2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012
Modeling the second-order statistics of articulatory trajectories is likely to improve the performance in classifying phone segments compared to using only linear combinations of MFCCs. Nevertheless, the extremely high dimensionality of the feature space spanned by a combination of monomials of degree-1 and degree-2 makes it difficult to effectively exploit the discriminative information in the full covariance matrix. This paper proposes a novel algorithm, dubbed Knowledge-based Quadratic Discriminant Analysis (KnQDA), for reducing the number of dimensions of the space spanned by degree-1 and degree-2 monomials by using phonetic knowledge for selecting the set of degree-2 monomials that are most likely to improve classification. KnQDA seeks a trade-off between overfitting and undertraining, which further improves the learnability. Binary classifications on all pairs of phones in TIMIT show the effectiveness of the proposed method, especially on those phone pairs that overlap strongly in the linear feature space.
Proceedings of the 11th …, 2009
Linear Discriminant Analysis (LDA) is a feature selection method in speech recognition. LDA finds transformations that maximizes the between-class scatter and minimizes within-class scatter. This transformation can be obtained in a class-dependent or class independent manner. In this paper, we propose a method to improve LDA and also we use it instead of DCT in MFCC extraction. This transformation matrix is computed through three evolutionary methods (GA, HS, and PSO) to optimize classdependent LDA transformation matrix for robust MFCC extraction. For this purpose, we first use logarithm of clean speech Mel filter bank energies (LMFE) of each class to define withinclass scatter for each class and between-class scatter for over all classes. Next, class-dependent transformation matrix is utilized in place of DCT in MFCC feature extraction. The experimental results show that the proposed speech recognition and optimization methods using class-dependent LDA, achieves a significant isolated word recognition rate on Aurora2 database.
2014
Mapping of information using pattern classifiers has become more popular now days, even though without a obvious agreement on what classifiers need to be utilized or even just how benefits must be screened. This paper states that by comparative analyses how information maps in multiple class situation which provides the information concerned on neural representation. Speech signal generated from wireless devices may have noise. Noise must be separated from signal. In order to separate noise from speech signal Linear and quadratic discriminant analysis can be used. Logistic regression can be also be used in order to get accurate signal on receiver end since it will calculate the probability.
The performance of a speech recogniser, or of any other pattern classifier, strongly depends on the input features: to obtain a good performance, the feature set needs to be both highly discriminative and compact. Linear discriminant analysis (LDA) is a common data-driven method used to find linear transformations that map large feature vectors onto smaller ones while retaining most of the discriminative power. LDA however oversimplifies the problem by condensing all class information into only two scatter matrices, hence losing important information on the individual class distributions. We therefore propose a new approach, based on the mutual information or minimum classification error paradigm, which takes all information on the individual class distributions into account while searching an optimal sub-space, thus avoiding the crude approximations done by LDA. Experiments show that the proposed scheme provides more discriminative feature vectors, leading to substantially better recognition results.
In classifying the pattern, the number of learning data used is often very limited, but the number of dimensions is very high. Fisher linear discriminant analysis (FLDA) is a pattern classification method that is widely used in pattern recognition feature extraction and reduction of linear dimensions. FLDA method is able to analyze the data and study the relationship between a set of categorical predictors and response for pattern recognition applications, including speech pattern recognition is used as a command to the system in the presence of employees of the agency. FLDA has the ability to distinguish one pattern with another pattern so that the pattern does not belong to the other so that the pattern of this match only one sound input with voice database which has resulted in data that is best suited for people with a sound level of accuracy that reaches 53.3% for opportunities best. This shows that this method is good enough to be used in the process of the speech recognition.
Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop, 1993
ABSTRACT Pattern recognition consists of feature extraction and classification over the extracted features. Usually, these two processes are designed separately, entailing that a resulting recognizer is not necessarily optimal in terms of classification accuracy. To overcome this gap in recognizer design, we introduce in this paper a new design concept, named Discriminative Feature Extraction (DFE). DFE is based on a recent discriminative learning theory, Minimum Classification Error formalization /Generalized Probabilistic Descent method, and provides an innovative way to design the entire process of recognition. A front-end feature extractor as well as a post-end classifier is consistently optimized under a single criterion of minimizing classification errors. The concept is quite general and can be applied to a wide range of pattern recognition tasks. This paper is devoted to the application of DFE to speech recognition. Experiments on a Japanese vowel recognition task show the advantages of...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.