Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, Journal of the American Statistical Association
…
35 pages
1 file
Classical statistical approaches for multiclass probability estimation are typically based on regression techniques such as multiple logistic regression, or density estimation approaches such as linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). These methods often make certain assumptions on the form of probability functions or on the underlying distributions of subclasses. In this article, we develop a model-free procedure to estimate multiclass probabilities based on large-margin classifiers. In particular, the new estimation scheme is employed by solving a series of weighted large-margin classifiers and then systematically extracting the probability information from these multiple classification rules. A main advantage of the proposed probability estimation technique is that it does not impose any strong parametric assumption on the underlying distribution and can be applied for a wide range of large-margin classification methods. A general computational algorithm is developed for class probability estimation. Furthermore, we establish asymptotic consistency of the probability estimates. Both simulated and real data examples are presented to illustrate competitive performance of the new approach and compare it with several other existing methods.
Biometrika, 2008
Large margin classifiers have proven to be effective in delivering high predictive accuracy, particularly those focusing on the decision boundaries and bypassing the requirement of estimating the class probability given input for discrimination. As a result, these classifiers may not directly yield an estimated class probability, which is of interest itself. To overcome this difficulty, this article proposes a novel method for estimating the class probability through sequential classifications, by using features of interval estimation of large-margin classifiers. The method uses sequential classifications to bracket the class probability to yield an estimate up to the desired level of accuracy. The method is implemented for support vector machines and ψ-learning, in addition to an estimated Kullback-Leibler loss for tuning. A solution path of the method is derived for support vector machines to reduce further its computational cost. Theoretical and numerical analyses indicate that the method is highly competitive against alternatives, especially when the dimension of the input greatly exceeds the sample size. Finally, an application to leukaemia data is described.
IFIP Advances in Information and Communication Technology, 2012
Venn Predictors (VPs) are machine learning algorithms that can provide well calibrated multiprobability outputs for their predictions. The only drawback of Venn Predictors is their computational inefficiency, especially in the case of large datasets. In this work, we propose an Inductive Venn Predictor (IVP) which overcomes the computational inefficiency problem of the original Venn Prediction framework. Each VP is defined by a taxonomy which separates the data into categories. We develop an IVP with a taxonomy derived from a multiclass Support Vector Machine (SVM), and we compare our method with other probabilistic methods for SVMs, namely Platt's method, SVM Binning, and SVM with Isotonic Regression. We show that these methods do not always provide well calibrated outputs, while our IVP will always guarantee this property under the i.i.d. assumption.
2009
Abstract Kernel discriminant analysis (KDA) is an effective approach for supervised nonlinear dimensionality reduction. Probabilistic models can be used with KDA to improve its robustness. However, the state of the art of such models could only handle binary class problems, which confines their application in many real world problems. To overcome this limitation, we propose a novel nonparametric probabilistic model based on Gaussian Process for KDA to handle multiclass problems.
Lecture Notes in Computer Science, 2001
Logistic regression is presumably the most popular representative of probabilistic discriminative classifiers. In this paper, a kernel variant of logistic regression is introduced as an iteratively re-weighted least-squares algorithm in kernel-induced feature spaces. This formulation allows us to apply highly efficient approximation methods that are capable of dealing with large-scale problems. For multi-class problems, a pairwise coupling procedure is proposed. Pairwise coupling for "kernelized" logistic regression effectively overcomes conceptual and numerical problems of standard multi-class kernel classifiers.
Journal of Computational Science, 2018
The improvement in the performance of classifiers has been the focus of attention of many researchers over the last few decades. Obtaining accurate predictions becomes more complicated as the number of classes increases. Most families of classification techniques generate models that define decision boundaries trying to separate the classes as well as possible. As an alternative, in this paper, we propose to hierarchically decompose the original multiclass problem by reducing the number of classes involved in each local subproblem. This is done by deriving a similarity matrix from the misclassification errors given by a first classifier that is learned for this, and then, using the similarity matrix to build a tree-like hierarchy of specialized classifiers. Then, we present two approaches to solve the multiclass problem: the first one traverses the tree of classifiers in a top-down manner similar to the way some hierarchical classification methods do for dealing with hierarchical domains; the second one is inspired in the way probabilistic decision trees compute class membership probabilities. To improve the efficiency of our methods, we propose a criterion to reduce the size of the hierarchy. We experimentally evaluate all of the proposals on a collection of multiclass datasets showing that, in general, the generated classifier hierarchies outperform the original (flat) multiclass classification.
2019
One of the central themes in the classification task is the estimation of class posterior probability at a new point $\bf{x}$. The vast majority of classifiers output a score for $\bf{x}$, which is monotonically related to the posterior probability via an unknown relationship. There are many attempts in the literature to estimate this latter relationship. Here, we provide a way to estimate the posterior probability without resorting to using classification scores. Instead, we vary the prior probabilities of classes in order to derive the ratio of pdf's at point $\bf{x}$, which is directly used to determine class posterior probabilities. We consider here the binary classification problem.
2004
Abstract Pairwise coupling is a popular multi-class classification method that combines all comparisons for each pair of classes. This paper presents two approaches for obtaining class probabilities. Both methods can be reduced to linear systems and are easy to implement. We show conceptually and experimentally that the proposed approaches are more stable than the two existing popular methods: voting and the method by Hastie and Tibshirani (1998)
Analytica Chimica Acta, 2010
This work describes multi-classification based on binary probabilistic discriminant partial least squares (p-DPLS) models, developed with the strategy one-against-one and the principle of winner-takes-all. The multi-classification problem is split into binary classification problems with p-DPLS models. The results of these models are combined to obtain the final classification result. The classification criterion uses the specific characteristics of an object (position in the multivariate space and prediction uncertainty) to estimate the reliability of the classification, so that the object is assigned to the class with the highest reliability. This new methodology is tested with the well-known Iris data set and a data set of Italian olive oils. When compared with CART and SIMCA, the proposed method has better average performance of classification, besides giving a statistic that evaluates the reliability of classification. For the olive oil set the average percentage of correct classification for the training set was close to 84% with p-DPLS against 75% with CART and 100% with SIMCA, while for the test set the average was close to 94% with p-DPLS as against 50% with CART and 62% with SIMCA.
Estimating class membership probabilities is an important step in many automated speech recognition systems. Since binary classifiers are usually easier to train, one common approach to this problem is to construct pairwise binary classifiers. Pairwise models yield an overdetermined system of equations for the class membership probabilities. Motivated by probabilistic arguments we propose a new way for estimating individual class membership probabilities, which reduces to solving a linear system of equations. A solution of this system is obtained by finding the unique non-zero eigenvector of total probability one, corresponding to eigenvalue one of a positive Markov matrix. This is a property shared by another algorithm previously proposed by Wu, Lin, and Weng. We compare properties of these methods in two settings: a theoretical three-way classification problem, and via classification of English monophthongs from TIMIT corpus. Index Terms: binary classifiers; multiclass classification; phoneme recognition; English vowels; TIMIT
Sensors, 2021
This paper presents a novel approach to the assessment of decision confidence when multi-class recognition is concerned. When many classification problems are considered, while eliminating human interaction with the system might be one goal, it is not the only possible option—lessening the workload of human experts can also bring huge improvement to the production process. The presented approach focuses on providing a tool that will significantly decrease the amount of work that the human expert needs to conduct while evaluating different samples. Instead of hard classification, which assigns a single label to each class, the described solution focuses on evaluating each case in terms of decision confidence—checking how sure the classifier is in the case of the currently processed example, and deciding if the final classification should be performed, or if the sample should instead be manually evaluated by a human expert. The method can be easily adjusted to any number of classes. I...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Strategic Decision Sciences, 2010
Proceedings of Machine Learning Research , 2017
Pattern Recognition Letters, 2007
Analytica Chimica Acta, 1987
Proceedings of the Third IEEE International …, 2003
Pattern Recognition Letters, 2016
IEEE Transactions on Neural Networks, 2004
Computing Research Repository - CORR, 2002
Journal of Applied Statistics, 2019