Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Generalized expectation (GE) criteria are terms in objective functions that assign scores to values of model expectations. In this paper we introduce GE-FL, a method that uses GE to train a probabilistic model using associations between input features and classes rather than complete labeled instances. Specifically, here the expectations are model predicted class distributions on unlabeled instances that contain selected input features. The score function is the KL divergence from reference distributions estimated using feature-class associations. We show that a multinomial logistic regression model trained with GE-FL outperforms several baseline methods that use feature-class associations. Next, we compare with a method that incorporates feature-class associations into Boosting and find that it requires 400 labeled instances to attain the same accuracy as GE-FL, which uses no labeled instances. In human annotation experiments, we show that labeling features is on average 3.7 times faster than labeling documents, a result that supports similar findings in previous work . Additionally, using GE-FL provides a 1.0% absolute improvement in final accuracy over semi-supervised training with labeled documents. The accuracy difference is often much more pronounced with only a few minutes of annotation, where we see absolute accuracy improvements as high as 40%.
It is difficult to apply machine learning to new domains because often we lack labeled problem instances. In this paper, we provide a solution to this problem that leverages domain knowledge in the form of affinities between input features and classes. For example, in a baseball vs. hockey text classification problem, even without any labeled data, we know that the presence of the word puck is a strong indicator of hockey. We refer to this type of domain knowledge as a labeled feature. In this paper, we propose a method for training discriminative probabilistic models with labeled features and unlabeled instances. Unlike previous approaches that use labeled features to create labeled pseudo-instances, we use labeled features directly to constrain the model's predictions on unlabeled instances. We express these soft constraints using generalized expectation (GE) criteriaterms in a parameter estimation objective function that express preferences on values of a model expectation. In this paper we train multinomial logistic regression models using GE criteria, but the method we develop is applicable to other discriminative probabilistic models. The complete objective function also includes a Gaussian prior on parameters, which encourages generalization by spreading parameter weight to unlabeled features. Experimental results on text classification data sets show that this method outperforms heuristic approaches to training classifiers with labeled features. Experiments with human annotators show that it is more beneficial to spend limited annotation time labeling features rather than labeling instances. For example, after only one minute of labeling features, we can achieve 80% accuracy on the ibm vs. mac text classification problem using GE-FL, whereas ten minutes labeling documents results in an accuracy of only 77%
It is difficult to apply machine learning to many real-world tasks because there are no existing labeled instances. In one solution to this problem, a human expert provides instance labels that are used in traditional supervised or semi-supervised training. Instead, we want a solution that allows us to leverage existing resources other than complete labeled instances. We propose the use of generalized expectation (GE) criteria [8] to achieve this goal. A GE criterion is a term in a training objective function that assigns a score to values of a model expectation. In this paper, the expectations are model predicted class distributions conditioned on the presence of selected features, and the score function is the Kullback-Leibler divergence from reference distributions that are estimated using existing resources. We apply this method to the problem of named-entity-recognition, leveraging available lexicons. Using no conventionally labeled instances, we learn a sliding-window multinomial logistic regression model that obtains an F1 score of 0.692 on the CoNLL 2003 data. To attain the same accuracy a supervised classifier requires 4,000 labeled instances.
We review two novel methods for text categorization, based on a new framework that utilizes richer annotations that we call annotator rationales. A human annotator provides hints to a machine learner by highlighting contextual "rationales" in support of each of his or her annotations. We have collected such rationales, in the form of substrings, for an existing document sentiment classification dataset [1]. We have developed two methods, one discriminative [2] and one generative , that use these rationales during training to obtain significant accuracy improvements over two strong baselines. Our generative model in particular could be adapted to help learn other kinds of probabilistic classifiers for quite different tasks. Based on a small study of annotation speed, we posit that for some tasks, providing rationales can be a more fruitful use of an annotator's time than annotating more examples.
Data annotation in modern practice often involves multiple, imperfect human annotators. Multiple annotations can be used to infer estimates of the ground-truth labels and to estimate individual annotator error characteristics (or reliability). We introduce MOMRESP, a model that improves upon item response models to incorporate information from both natural data clusters as well as annotations from multiple annotators to infer ground-truth labels for the document classification task. We implement this model and show that MOMRESP can use unlabeled data to improve estimates of the ground-truth labels over a majority vote baseline dramatically in situations where both annotations are scarce and annotation quality is low as well as in situations where annotators disagree consistently. Correspondingly, in those same situations, estimates of annotator reliability are also stronger than the majority vote baseline. Because MOMRESP predictions are subject to label switching, we introduce a so...
Proceedings of the 1st …, 2006
In this paper we describe a method which combines semi-supervised and active learning for the classification task. In particular, we propose a semi-supervised PLSA (Probabilistic Latent Semantic Analysis) algorithm [4] combined with a certainty-based active learning method, in order to classify text documents.
1996
Corpus-based methods for natural language processing often use supervised training, requiring expensive manual annotation of training corpora. This paper investigates methods for reducing annotation cost by sample selection. In this approach, during training the learning program examines many unlabeled examples and selects for labeling (annotation) only those that are most informative at each stage. This avoids redundantly annotating examples that contribute little new information. This paper extends our previous work on committee-based sample selection for probabilistic classifiers. We describe a family of methods for committee-based sample selection, and report experimental results for the task of stochastic part-ofspeech tagging. We find that all variants achieve a significant reduction in annotation cost, though their computational efficiency differs. In particular, the simplest method, which has no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger.
2003
Abstract We consider the problem of modeling annotated data---data with multiple types where the instance of one type (such as a caption) serves as a description of the other type (such as an image). We describe three hierarchical probabilistic mixture models which aim to describe such data, culminating in correspondence latent Dirichlet allocation, a latent variable model that is effective at modeling the joint distribution of both types and the conditional distribution of the annotation given the primary type.
With the increasing popularity of online crowdsourcing platforms such as Amazon Mechanical Turk (AMT), building supervised learning models for datasets with multiple annotators is receiving an increasing attention from researchers. These platforms provide an inexpensive and accessible resource that can be used to obtain labeled data, and in many situations the quality of the labels competes directly with those of experts. For such reasons, much attention has recently been given to annotator-aware models. In this paper, we propose a new probabilistic model for supervised learning with multiple annotators where the reliability of the different annotators is treated as a latent variable. We empirically show that this model is able to achieve state of the art performance, while reducing the number of model parameters, thus avoiding a potential overfitting. Furthermore, the proposed model is easier to implement and extend to other classes of learning problems such as sequence labeling tasks.
arXiv (Cornell University), 2022
This paper discusses a crowdsourcing based method that we designed to quantify the importance of different attributes of a dataset in determining the outcome of a classification problem. This heuristic, provided by humans acts as the initial weight seed for machine learning models and guides the model towards a better optimal during the gradient descent process. Often times when dealing with data, it is not uncommon to deal with skewed datasets, that over represent items of certain classes, while underrepresenting the rest. Skewed datasets may lead to unforeseen issues with models such as learning a biased function or overfitting. Traditional data augmentation techniques in supervised learning include oversampling and training with synthetic data. We introduce an experimental approach to dealing with such unbalanced datasets by including humans in the training process. We ask humans to rank the importance of features of the dataset, and through rank aggregation, determine the initial weight bias for the model. We show that collective human bias can allow ML models to learn insights about the true population instead of the biased sample. In this paper, we use two rank aggregator methods Kemeny Young and the Markov Chain aggregator to quantify human opinion on importance of features. This work mainly tests the effectiveness of human knowledge on binary classification (Popular vs Not-popular) problems on two ML models: Deep Neural Networks and Support Vector Machines. This approach considers humans as weak learners and relies on aggregation to offset individual biases and domain unfamiliarity.
2011 10th International Conference on Machine Learning and Applications and Workshops, 2011
In this paper we study the problem of building document classifiers using labeled features and unlabeled documents, where not all the features are helpful for the process of learning. This is an important setting, since building classifiers using labeled words has been recently shown to require considerably less human labeling effort than building classifiers using labeled documents. We propose the use of Generalized Expectation (GE) criteria combined with a L1 regularization term for learning from labeled features. This lets the feature labels guide model expectation constraints, while approaching feature selection from a regularization perspective. We show that GE criteria combined with L1 regularization consistently outperforms - up to 12% increase in accuracy - the best previously reported results in the literature under the same setting, obtained using L2 regularization. Furthermore, the results obtained with GE criteria and L1 regularizer are competitive to those obtained in the traditional instance-labeling setting, with the same labeling cost.
This paper shows that the accuracy of learned text classi ers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classi cation problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available.
For several decades, statisticians have advocated using a combination of labeled and unlabeled data to train classifiers by estimating parameters of a generative model through iterative Expectation-Maximization (EM) techniques. This chapter explores the effectiveness of this approach when applied to the domain of text classification. Text documents are represented here with a bag-of-words model, which leads to a generative classification model based on a mixture of multinomials. This model is an extremely simplistic representation of the complexities of written text. This chapter explains and illustrates three key points about semi-supervised learning for text classification with generative models. First, despite the simplistic representation, some text domains have a high positive correlation between generative model probability and classification accuracy. In these domains, a straightforward application of EM with the naive Bayes text model works well. Second, some text domains do not have this correlation. Here we can adopt a more expressive and appropriate generative model that does have a positive correlation. In these domains, semi-supervised learning again improves classification accuracy. Finally, EM suffers from the problem of local maxima, especially in high dimension domains such as text classification. We demonstrate that deterministic annealing, a variant of EM, can help overcome the problem of local maxima and increase classification accuracy further when the generative model is appropriate.
2010
Supervised learning from multiple labeling sources is an increasingly important problem in machine learning and data mining. This paper develops a probabilistic approach to this problem when annotators may be unreliable (labels are noisy), but also their expertise varies depending on the data they observe (annotators may have knowledge about different parts of the input space). That is, an annotator may not be consistently accurate (or inaccurate) across the task domain. The presented approach produces classification and annotator models that allow us to provide estimates of the true labels and annotator variable expertise. We provide an analysis of the proposed model under various scenarios and show experimentally that annotator expertise can indeed vary in real tasks and that the presented approach provides clear advantages over previously introduced multi-annotator methods, which only consider general annotator characteristics.
A common obstacle preventing the rapid deployment of supervised machine learning algorithms is the lack of labeled training data. This is particularly expensive to obtain for structured prediction tasks, where each training instance may have multiple, interacting labels, all of which must be correctly annotated for the instance to be of use to the learner. Traditional active learning addresses this problem by optimizing the order in which the examples are labeled to increase learning efficiency. However, this approach does not consider the difficulty of labeling each example, which can vary widely in structured prediction tasks. For example, the labeling predicted by a partially trained system may be easier to correct for some instances than for others. We propose a new active learning paradigm which reduces not only how many instances the annotator must label, but also how difficult each instance is to annotate. The system leverages information from partially correct predictions to efficiently solicit annotations from the user. We validate this active learning framework in an interactive information extraction system, reducing the total number of annotation actions by 22%.
Pattern Recognition, 2015
Automatic image annotation consists on automatically labeling images, or image regions, with a pre-defined set of keywords, which are regarded as descriptors of the high-level semantics of the image. In supervised learning, a set of previously annotated images is required to train a classifier. Annotating a large quantity of images by hand is a tedious and time consuming process; so an alternative approach is to label manually a small subset of images, using the other ones under a semi-supervised approach. In this paper, a new semi-supervised ensemble of classifiers, called WSA, for automatic image annotation is proposed. WSA uses naive Bayes as its base classifier. A set of these is combined in a cascade based on the AdaBoost technique. However, when training the ensemble of Bayesian classifiers, it also considers the unlabeled images on each stage. These are annotated based on the classifier from the previous stage, and then used to train the next classifier. The unlabeled instances are weighted according to a confidence measure based on their predicted probability value; while the labeled instances are weighted according to the classifier error, as in standard AdaBoost. WSA has been evaluated with benchmark data sets, and 2 sets of images, with promising results.
ArXiv, 2022
LeQua 2022 is a new lab for the evaluation of methods for “learning to quantify” in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting. For each such setting we provide data either in ready-made vector form or in raw document form. 1 Learning to Quantify In a number of applications involving classification, the final goal is not determining which class (or classes) individual unlabelled items belong to, but estimating the prevalence (or “r...
INTERSPEECH, 2019
Natural Language Understanding (NLU) models are typicallytrained in a supervised learning framework. In the case of in-tent classification, the predicted labels are predefined and basedon the designed annotation schema while the labeling processis based on a laborious task where annotators manually inspecteach utterance and assign the corresponding label. We proposean Active Annotation (AA) approach where we combine an un-supervised learning method in the embedding space, a human-in-the-loop verification process, and linguistic insights to createlexicons that can be open categories and adapted over time. Inparticular, annotators define the y-label space on-the-fly dur-ing the annotation using an iterative process and without theneed for prior knowledge about the input data. We evaluate theproposed annotation paradigm in a real use-case NLU scenario.Results show that our Active Annotation paradigm achieves ac-curate and higher quality training data, with an annotation speedof an order of magnitude higher with respect to the traditionalhuman-only driven baseline annotation methodology
2010
Automated image annotation consists of a number of techniques that aim to find the correlation between words and image features such as colour, shape, and texture to provide correct annotation words to images. In particular, approaches based on Bayesian theory use machine-learning techniques to learn statistical models from a training set of pre-annotated images and apply them to generate annotations for unseen images. <br><br> The focus of this thesis lies in demonstrating that an approach, which goes beyond learning the statistical correlation between words and visual features and also exploits information about the actual semantics of the words used in the annotation process, is able to improve the performance of probabilistic annotation systems. Specifically, I present three experiments. Firstly, I introduce a novel approach that automatically refines the annotation words generated by a non-parametric density estimation model using semantic relatedness measures. Init...
Proceedings of the 18th Australasian Document Computing Symposium on - ADCS '13, 2013
Statistical methods for text classification are predominantly based on the paradigm of class-based learning that associates class variables with features, discarding the instances of data after model training. This results in efficient models, but neglects the fine-grained information present in individual documents. Instance-based learning uses this information, but suffers from data sparsity with text data. In this paper, we propose a generative model called Tied Document Mixture (TDM) for extending Multinomial Naive Bayes (MNB) with mixtures of hierarchically smoothed models for documents. Alternatively, TDM can be viewed as a Kernel Density Classifier using class-smoothed Multinomial kernels. TDM is evaluated for classification accuracy on 14 different datasets for multi-label, multi-class and binaryclass text classification tasks and compared to instance-and class-based learning baselines. The comparisons to MNB demonstrate a substantial improvement in accuracy as a function of available training documents per class, ranging up to average error reductions of over 26% in sentiment classification and 65% in spam classification. On average TDM is as accurate as the best discriminative classifiers, but retains the linear time complexities of instance-based learning methods, with exact algorithms for both model estimation and inference.
2007
Automatic image annotation consists on automatically labeling images, or image regions, with a pre-defined set of keywords, which are regarded as descriptors of the high-level semantics of the image. In supervised learning, a set of previously annotated images is required to train a classifier. Annotating a large quantity of images by hand is a tedious and time consuming process; so an alternative approach is to label manually a small subset of images, using the other ones under a semi-supervised approach. In this paper, a new semi-supervised ensemble of classifiers, called WSA, for automatic image annotation is proposed. WSA uses naive Bayes as its base classifier. A set of these is combined in a cascade based on the AdaBoost technique. However, when training the ensemble of Bayesian classifiers, it also considers the unlabeled images on each stage. These are annotated based on the classifier from the previous stage, and then used to train the next classifier. The unlabeled instances are weighted according to a confidence measure based on their predicted probability value; while the labeled instances are weighted according to the classifier error, as in standard AdaBoost. WSA has been evaluated with benchmark data sets, and 2 sets of images, with promising results.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.