Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2016
It is standard to perform classification tasks under the assumption that class labels are deterministic. In this context, the F-measure is an increasingly popular measure of performance for a classifier, and expresses a flexible trade-off between precision and recall. However, it may just as easily be advisable to remove this assumption and consider instances as belonging to each class with given probabilities. The presence of uncertainty in a training set may be due to subjectivity of a classification task or noise introduced during data collection. In this paper, we adapt the classical F-measure to the uncertain context and present an efficient, easy-to-implement algorithm for the optimization of this new “noisy ” F-measure within the maximum entropy modeling framework. We provide comprehensive theoretical justification along with numerical experiments that demon-strate the novelty and effectiveness of this approach.
We define a generalized likelihood function based on uncertainty measures and show that maximizing such a likelihood function for different measures induces different types of classifiers. In the probabilistic framework, we obtain classifiers that optimize the cross-entropy function. In the possibilistic framework, we obtain classifiers that maximize the interclass margin. Furthermore, we show that the support vector machine is a sub-class of these maximum-margin classifiers.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
The uncertainty measurement of classified results is especially important in areas requiring limited human resources for higher accuracy. For instance, data-driven algorithms diagnosing diseases need accurate uncertainty score to decide whether additional but limited quantity of experts are needed for rectification. However, few uncertainty models focus on improving the performance of text classification where human resources are involved. To achieve this, we aim at generating accurate uncertainty score by improving the confidence of winning scores. Thus, a model called MSD, which includes three independent components as "mix-up", "self-ensembling", "distinctiveness score", is proposed to improve the accuracy of uncertainty score by reducing the effect of overconfidence of winning score and considering the impact of different categories of uncertainty simultaneously. MSD can be applied with different Deep Neural Networks. Extensive experiments with ablation setting are conducted on four real-world datasets, on which, competitive results are obtained.
The increasing number of scientific literature on the Web and the absence of efficient tools used for classifying and searching the documents are the two most important factors that influence the speed of the search and the quality of the results. Previous studies have shown that the usage of ontologies makes it possible to process document and query information at the semantic level, which greatly improves the search for the relevant information and makes one step further towards the Semantic Web. A fundamental step in these approaches is the annotation of documents with ontology concepts, which can also be seen as a classification task. In this paper we address this issue for the biomedical domain and present a new automated and robust method, based on a Maximum Entropy approach, for annotating biomedical literature documents with terms from the Medical Subject Headings (MeSH). The experimental evaluation shows that the suggested Maximum Entropy approach for annotating biomedical documents with MeSH terms is highly accurate, robust to the ambiguity of terms, and can provide very good performance even when a very small number of training documents is used. More precisely, we show that the proposed algorithm obtained an average F-measure of 92.4% (precision 99.41%, recall 86.77%) for the full range of the explored terms (4,078 MeSH terms), and that the algorithm's performance is resilient to terms' ambiguity, achieving an average F-measure of 92.42% (precision 99.32%, recall 86.87%) in the explored MeSH terms which were found to be ambiguous according to the Unified Medical Language System (UMLS) thesaurus. Finally, we compared the results of the suggested methodology with a Naive Bayes and a Decision Trees classification approach, and we show that the Maximum Entropy based approach performed with higher F-Measure in both ambiguous and monosemous MeSH terms.
Annotating a dataset is one of the major bottlenecks in supervised learning tasks, as it can be expensive and time-consuming. Instead, with the development of crowdsourcing services, it has become easy and fast to collect labels from multiple annotators. Our contribution in this paper is to propose a Bayesian probabilistic approach integrating annotator's uncertainty in the task of learning from multiple noisy annotators (annotators who generate errors). Furthermore, unlike previous work, our proposed approach is directly formulated to handle categorical labels. This is an important point as real-world datasets often have multiple classes available. Extensive experiments on datasets validate the effectiveness of our approach against previous efficient algorithms.
IEEE Transactions on Knowledge and Data Engineering, 2019
We introduce a framework for the evaluation of multiclass classifiers by exploring their confusion matrices. Instead of using error-counting measures of performance, we concentrate in quantifying the information transfer from true to estimated labels using information-theoretic measures. First, the Entropy Triangle allows us to visualize the balance of mutual information, variation of information and the deviation from uniformity in the true and estimated label distributions. Next the Entropy-Modified Accuracy allows us to rank classifiers by performance while the Normalized Information Transfer rate allows us to evaluate classifiers by the amount of information accrued during learning. Finally, if the question rises to elucidate which errors are systematically committed by the classifier, we use a generalization of Formal Concept Analysis to elicit such knowledge. All such techniques can be applied either to artificially or biologically embodied classifiers-e.g. human performance on perceptual tasks. We instantiate the framework in a number of examples to provide guidelines for the use of these tools in the case of assessing single classifiers or populations of them-whether induced with the same technique or not-either on single tasks or in a set of them. These include well-known UCI tasks and the more complex KDD cup 99 competition on Intrusion Detection.
2012 IEEE 12th International Conference on Data Mining Workshops, 2012
The crowdsourcing services became popular making it easy and fast to label datasets by multiple annotators in order to achieve supervised learning tasks. Unfortunately, in this context, annotators are not reliable as they may have different levels of experience or knowledge. Furthermore, the data to be labeled may also vary in their level of difficulty. How do we deal with hard data to label and unreliable annotators? In this paper, we present a probabilistic model to learn from multiple naive annotators, considering that annotators may decline to label an instance when they are unsure. Both errors and ignorance of annotators are integrated separately into the proposed Bayesian model. Experiments on several datasets show that our method achieves superior performance compared to other efficient learning algorithms.
In supervised learning, many measures are based on the concept of entropy. A major characteristic of the entropies is that they take their maximal value when the distribution of the modalities of the class variable is uniform. To deal with the case where the a priori frequencies of the class variable modalities are very imbalanced, we propose an off-centered entropy which takes its maximum value for a distribution fixed by the user. This distribution can be the a priori distribution of the class variable modalities or a distribution taking into account the costs of misclassification.
IEEE Access
The authors acknowledge the financial support of VLAIO (Flemish Innovation & Entrepreneurship) through the Baekeland PhD mandate (HBC.2019.2604).
This paper proposes the use of maximum entropy techniques for text classi cation. Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language modeling, part-of-speech tagging, and text segmentation. The underlying principle of maximum entropy is that without external knowledge, one should prefer distributions that are uniform. Constraints on the distribution, derived from labeled training data, inform the technique where to be minimally non-uniform. The maximum entropy formulation has a unique solution which can be found by the improved iterative scaling algorithm. In this paper, maximum entropy is used for text classi cation by estimating the conditional distribution of the class variable given the document. In experiments on several text datasets we compare accuracy to naive Bayes and show that maximum entropy is sometimes signi cantly better, but also sometimes worse. Much future work remains, but the results indicate that maximum entropy is a promising technique for text classi cation.
Lecture Notes in Computer Science, 2023
This full-day tutorial introduces modern techniques for practical uncertainty quantification specifically in the context of multi-class and multi-label text classification. First, we explain the usefulness of estimating aleatoric uncertainty and epistemic uncertainty for text classification models. Then, we describe several state-of-theart approaches to uncertainty quantification and analyze their scalability to big text data: Virtual Ensemble in GBDT, Bayesian Deep Learning (including Deep Ensemble, Monte-Carlo Dropout, Bayes by Backprop, and their generalization Epistemic Neural Networks), Evidential Deep Learning (including Prior Networks and Posterior Networks), as well as Distance Awareness (including Spectral-normalized Neural Gaussian Process and Deep Deterministic Uncertainty). Next, we talk about the latest advances in uncertainty quantification for pre-trained language models (including asking language models to express their uncertainty, interpreting uncertainties of text classifiers built on large-scale language models, uncertainty estimation in text generation, calibration of language models, and calibration for in-context learning). After that, we discuss typical application scenarios of uncertainty quantification in text classification (including in-domain calibration, cross-domain robustness, and novel class detection). Finally, we list popular performance metrics for the evaluation of uncertainty quantification effectiveness in text classification. Practical handson examples/exercises are provided to the attendees for them to experiment with different uncertainty quantification methods on a few real-world text classification datasets such as CLINC150.
2018
In this review we explore the uncertainty of common supervised classification models. We explains some of their prior assumptions, how these assumptions bias their predicted probabilities, and how to interpret their confidence values in different situations. We also describe our proposed method to make common classifiers more reliable and versatile, and how these can be used in fluctuating scenarios in which unexpected classes and anomalies may appear during deployment. Furthermore, we show an extension of proper loss functions that allow classifiers; that minimize an empirical loss; to be trained with weak labels (labels that may be wrong). Finally, we discuss two future directions of our current work: (1) how to get better probability estimates in Deep Neural Networks, and (2) new methods to reuse old datasets whose labels may be outdated and weak. keywords: Supervised learning, Semi-supervised learning, classifier calibration, classification with confidence, cautious classificati...
Information Sciences, 2020
This paper provides new insight into the analysis on the relationship between uncertainty and misclassification of a classifier. We formulate the relationship explicitly by taking entropy as a measurement of uncertainty and by analyzing the misclassification rate based on the membership degree difference. Focusing on binary classification problems, this study theoretically and experimentally validates that the misclassification rate will definitely be upgrading with the increase of uncertainty if two conditions are satisfied: (1) the distributions of two classes based on membership degree difference are unimodal, and (2) these two distributions attain peaks when the membership degree difference is less and larger than zero, respectively. This work aims to provide some practical guidelines for improving classifier performance through clearly expressing and understanding the relationship between uncertainty and misclassification of a classifier.
Studies in Computational Intelligence, 2010
Many algorithms of machine learning use an entropy measure as optimization criterion. Among the widely used entropy measures, Shannon's is one of the most popular. In some real world applications, the use of such entropy measures without precautions, could lead to inconsistent results. Indeed, the measures of entropy are built upon some assumptions which are not fulfilled in many real cases. For instance, in supervised learning such as decision trees, the classification cost of the classes is not explicitly taken into account in the tree growing process. Thus, the misclassification costs are assumed to be the same for all classes. In the case where those costs are not equal on all classes, the maximum of entropy must be elsewhere than on the uniform probability distribution. Also, when the classes don't have the same a priori distribution of probability, the worst case (maximum of the entropy) must be elsewhere than on the uniform distribution. In this paper, starting from real world problems, we will show that classical entropy measures are not suitable for building a predictive model. Then, we examine the main axioms that define an entropy and discuss their inadequacy in machine learning. This we lead us to propose a new entropy measure that possesses more suitable proprieties. After what, we carry out some evaluations on data sets that illustrate the performance of the new measure of entropy.
Lecture Notes in Computer Science, 2014
This paper provides new insight into maximizing F1 measures in the context of binary classification and also in the context of multilabel classification. The harmonic mean of precision and recall, the F1 measure is widely used to evaluate the success of a binary classifier when one class is rare. Micro average, macro average, and per instance average F1 measures are used in multilabel classification. For any classifier that produces a real-valued output, we derive the relationship between the best achievable F1 value and the decision-making threshold that achieves this optimum. As a special case, if the classifier outputs are well-calibrated conditional probabilities, then the optimal threshold is half the optimal F1 value. As another special case, if the classifier is completely uninformative, then the optimal behavior is to classify all examples as positive. When the actual prevalence of positive examples is low, this behavior can be undesirable. As a case study, we discuss the results, which can be surprising, of maximizing F1 when predicting 26,853 labels for Medline documents.
Proceedings of the 2019 Conference of the North, 2019
The uncertainty measurement of classifiers' predictions is especially important in applications such as medical diagnoses that need to ensure limited human resources can focus on the most uncertain predictions returned by machine learning models. However, few existing uncertainty models attempt to improve overall prediction accuracy where human resources are involved in the text classification task. In this paper, we propose a novel neural-networkbased model that applies a new dropoutentropy method for uncertainty measurement. We also design a metric learning method on feature representations, which can boost the performance of dropout-based uncertainty methods with smaller prediction variance in accurate prediction trials. Extensive experiments on real-world data sets demonstrate that our method can achieve a considerable improvement in overall prediction accuracy compared to existing approaches. In particular, our model improved the accuracy from 0.78 to 0.92 when 30% of the most uncertain predictions were handed over to human experts in "20NewsGroup" data.
2019
In classification with a reject option, the classifier is allowed in uncertain cases to abstain from prediction. The classical cost based model of an optimal classifier with a reject option requires the cost of rejection to be defined explicitly. An alternative bounded-improvement model, avoiding the notion of the reject cost, seeks for a classifier with a guaranteed selective risk and maximal cover. We prove that both models share the same class of optimal strategies, and we provide an explicit relation between the reject cost and the target risk being the parameters of the two models. An optimal rejection strategy for both models is based on thresholding the conditional risk defined by posterior probabilities which are usually unavailable. We propose a discriminative algorithm learning an uncertainty function which preserves ordering of the input space induced by the conditional risk, and hence can be used to construct optimal rejection strategies.
Generalized expectation (GE) criteria are terms in objective functions that assign scores to values of model expectations. In this paper we introduce GE-FL, a method that uses GE to train a probabilistic model using associations between input features and classes rather than complete labeled instances. Specifically, here the expectations are model predicted class distributions on unlabeled instances that contain selected input features. The score function is the KL divergence from reference distributions estimated using feature-class associations. We show that a multinomial logistic regression model trained with GE-FL outperforms several baseline methods that use feature-class associations. Next, we compare with a method that incorporates feature-class associations into Boosting and find that it requires 400 labeled instances to attain the same accuracy as GE-FL, which uses no labeled instances. In human annotation experiments, we show that labeling features is on average 3.7 times faster than labeling documents, a result that supports similar findings in previous work . Additionally, using GE-FL provides a 1.0% absolute improvement in final accuracy over semi-supervised training with labeled documents. The accuracy difference is often much more pronounced with only a few minutes of annotation, where we see absolute accuracy improvements as high as 40%.
1999
Real world data can be difficult to classify due to overlapping classes of ambiguous data. One solution to this problem is to leave out data before classifying, while another solution is to first classify the data and then prune those results which are ambiguous. However, a problem exists in determining which data are ambiguous. In this paper we propose a performance criteria which gives a precise basis for characterizing the performance of any classifier applied to ambiguous data. Further, we demonstrate that there is an optimal region of withholding classifications which depends on the performance criteria. We test our method on some benchmark classification problems to show the effectiveness of the approach.
ACM Computing Surveys, 2023
Methods to classify objects into two or more classes are at the core of various disciplines. When a set of objects with their true classes is available, a supervised classiier can be trained and employed to decide if, for example, a new patient has cancer or not. The choice of performance measure is critical in deciding which supervised method to use in any particular classiication problem. Diferent measures can lead to very diferent choices, so the measure should match the objectives. Many performance measures have been developed, and one of them is the F-measure, the harmonic mean of precision and recall. Originally proposed in information retrieval, the F-measure has gained increasing interest in the context of classiication. However, the rationale underlying this measure appears weak, and unlike other measures it does not have a representational meaning. The use of the harmonic mean also has little theoretical justiication. The F-measure also stresses one class, which seems inappropriate for general classiication problems. We provide a history of the F-measure and its use in computational disciplines, describe its properties, and discuss criticism about the F-Measure. We conclude with alternatives to the F-measure, and recommendations of how to use it efectively.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.