Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003, INTERNATIONAL JOINT CONFERENCE ON …
This paper investigates a new approach for training discriminant classifiers when only a small set of labeled data is available together with a large set of unlabeled data. This algorithm optimizes the classification maximum likelihood of a set of labeledunlabeled data, using a variant form of the Classification Expectation Maximization (CEM) algorithm. Its originality is that it makes use of both unlabeled data and of a probabilistic misclassification model for these data. The parameters of the labelerror model are learned together with the classifier parameters. We demonstrate the effectiveness of the approach on four data-sets and show the advantages of this method over a previously developed semi-supervised algorithm which does not consider imperfections in the labeling process.
2003
This paper investigates a new approach for training discriminant classifiers when only a small set of labeled data is available together with a large set of unlabeled data. This algorithm optimizes the classification maximum likelihood of a set of labeledunlabeled data, using a variant form of the Classification Expectation Maximization (CEM) algorithm. Its originality is that it makes use of both unlabeled data and of a probabilistic misclassification model for these data. The parameters of the labelerror model are learned together with the classifier parameters. We demonstrate the effectiveness of the approach on four data-sets and show the advantages of this method over a previously developed semi-supervised algorithm which does not consider imperfections in the labeling process.
… The 30th European …, 2008
This paper investigates a new extension of the Probabilistic Latent Semantic Analysis (PLSA) model for text classification where the training set is partially labeled. The proposed approach iteratively labels the unlabeled documents and estimates the probabilities of its labeling errors. These probabilities are then taken into account in the estimation of the new model parameters before the next round. Our approach outperforms an earlier semi-supervised extension of PLSA introduced by [9] which is based on the use of fake labels. However, it maintains its simplicity and ability to solve multiclass problems. In addition, it gives valuable information about the most uncertain and difficult classes to label. We perform experiments over the 20Newsgroups, WebKB and Reuters document collections and show the effectiveness of our approach over two other semi-supervised algorithms applied to these text classification problems.
Lecture Notes in Computer Science, 2002
A key difficulty for applying machine learning classification algorithms for many applications is that they require a lot of handlabeled examples. Labeling large amount of data is a costly process which in many cases is prohibitive. In this paper we show how the use of a small number of labeled data together with a large number of unlabeled data can create high-accuracy classifiers. Our approach does not rely on any parametric assumptions about the data as it is usually the case with generative methods widely used in semi-supervised learning. We propose new discriminant algorithms handling both labeled and unlabeled data for training classification models and we analyze their performances on different information access problems ranging from text span classification for text summarization to e-mail spam detection and text classification.
A graph-based prior is proposed for parametric semi-supervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of co-training. An EM algorithm for training the classifier automatically adjusts the tradeoff between the contributions of: (a) the labelled data; (b) the unlabelled data; and (c) the co-training information. Active label query selection is performed using a mutual information based criterion that explicitly uses the unlabelled data and the co-training information. Encouraging results are presented on public benchmarks and on measured data from single and multiple sensors.
2004
A graph-based prior is proposed for parametric semi-supervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of co-training. An EM algorithm for training the classifier automatically adjusts the tradeoff between the contributions of: (a) the labelled data; (b) the unlabelled data; and (c) the co-training information. Active label query selection is performed using a mutual information based criterion that explicitly uses the unlabelled data and the co-training information. Encouraging results are presented on public benchmarks and on measured data from single and multiple sensors.
arXiv (Cornell University), 2018
Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of unclassified data, to perform a classification in situations when, typically, there is little labeled data. Even though this is not always possible (it depends on how useful, for inferring the labels, it would be to know the distribution of the unlabeled data), several algorithm have been proposed recently. A new algorithm is proposed, that under almost necessary conditions, attains asymptotically the performance of the best theoretical rule as the amount of unlabeled data tends to infinity. The set of necessary assumptions, although reasonable, show that semi-supervised classification only works for very well conditioned problems. The focus is on understanding when and why semi-supervised learning works when the size of the initial training sample remains fixed and the asymptotic is on the size of the unlabeled data. The performance of the algorithm is assessed in the well known "Isolet" real-data of phonemes, where a strong dependence on the choice of the initial training sample is shown. Semi-supervised learning; Small training sample; Consistency.
As the amount of online document increases, the demand for document classification to aid the analysis and management of document is increasing. Text is cheap, but information, in the form of knowing what classes a document belongs to, is expensive. The main purpose of this paper is to explain the expectation maximization technique of data mining to classify the document and to learn how to improve the accuracy while using semi-supervised approach. Expectation maximization algorithm is applied with both supervised and semi-supervised approach. It is found that semi-supervised approach is more accurate and effective. The main advantage of semi supervised approach is "DYNAMICALLY GENERATION OF NEW CLASS". The algorithm first trains a classifier using the labeled document and probabilistically classifies the unlabeled documents. The car dataset for the evaluation purpose is collected from UCI repository dataset in which some changes have been done from our side.
In many important text classi cation problems, acquiring class labels for training documents is costly, while gathering large quantities of unlabeled data is cheap. This paper shows that the accuracy of text classi ers trained with a small number of labeled documents can be improved by augmenting this small training set with a large pool of unlabeled documents. We present a theoretical argument showing that, under common assumptions, unlabeled data contain information about the target function. We then introduce an algorithm for learning from labeled and unlabeled text based on the combination of Expectation-Maximization with a naive Bayes classi er. The algorithm rst trains a classi er using the available labeled documents, and probabilistically labels the unlabeled documents; it then trains a new classi er using the labels for all the documents, and iterates to convergence. Experimental results, obtained using text from three di erent realworld tasks, show that the use of unlabeled data reduces classi cation error by up to 33%.
This paper shows that the accuracy of learned text classi ers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classi cation problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available.
2008 8th IEEE International Conference on Automatic Face & Gesture Recognition, 2008
Linear Discriminant Analysis (LDA) has been a popular method for feature extracting and face recognition. As a supervised method, it requires manually labeled samples for training, while making labeled samples is a time consuming and exhausting work. A semi-supervised LDA (SDA [3]) has been proposed recently to enable training of LDA with partially labeled samples. In this paper, we first reformulate supervised LDA based on the normalized perspective of LDA. Then we show that such a reformulation is powerful for semi-supervised learning of LDA. We call this approach Normalized LDA, which uses total diversity to normalize intra-class diversity and aims to find projection directions that minimize normalized intra-class diversity. Although the Normalized LDA is identical to LDA in the supervised situation, a semi-supervised approach can be easily incorporated into its framework to make use of unlabeled samples to improve the performance in the learned subspace. Moreover, different with SDA which uses unlabeled samples to preserve neighboring relations, unlabeled samples in the Normalized LDA are used for a more accurate estimation of data space. Experiments of face recognition on the FRGC version 2 database and CMU PIE database demonstrate that the Normalized LDA outperforms SDA.
2010
This lecture focused on methods of combining labeled and unlabeled data to learn a classifier. As a motivating example, suppose we would like to classify web pages as either fraudulent or not fraudulent. In this case, obtaining unlabeled data (i.e., webpages) is easy. However, labeling such data can be very costly since it requires humans to manually look at each webpage and determine whether or not it is a scam. We might hope that by making use of the unlabeled data in a clever way, we could learn a classifier without requiring as much labeled data as we would normally need. In this lecture, we consider two learning models: semi-supervised learning, and active learning.
Principles of Data Mining and Knowledge Discovery, 2000
Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, for many text classification tasks, providing labeled training documents is expensive, while unlabeled documents are readily available in large quantities. Learning from both, labeled and unlabeled documents, in a semisupervised framework is a promising approach to reduce the need for labeled training documents. This paper compares three commonly applied text classifiers in the light of semi-supervised learning, namely a linear support vector machine, a similarity-based tfidf and a Naïve Bayes classifier. Results on a real-world text datasets show that these learners may substantially benefit from using a large amount of unlabeled documents in addition to some labeled documents.
Test, 2019
Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of unclassified data, to perform a classification in situations when, typically, there is little labeled data. Even though this is not always possible (it depends on how useful, for inferring the labels, it would be to know the distribution of the unlabeled data), several algorithm have been proposed recently. A new algorithm is proposed, that under almost necessary conditions, attains asymptotically the performance of the best theoretical rule as the amount of unlabeled data tends to infinity. The set of necessary assumptions, although reasonable, show that semi-supervised classification only works for very well conditioned problems. The focus is on understanding when and why semi-supervised learning works when the size of the initial training sample remains fixed and the asymptotic is on the size of the unlabeled data. The performance of the algorithm is assessed in the well known "Isolet" real-data of phonemes, where a strong dependence on the choice of the initial training sample is shown. Semi-supervised learning; Small training sample; Consistency.
Information Sciences, 2021
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ArXiv, 2021
The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance. However, they heavily rely on domain-specific data augmentations, which are not easy to generate for all data modalities. Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation. We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models; these predictions generate many incorrect pseudo-labels, leading to noisy training. We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process. Furthermore, UPS generalizes the pseudo-labeling process, allowing for the creation of negative pseudo-labels; these negative pseudo-labels can be used for multi-label classification as well as ...
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008
In this paper, we address the problem of learning when some cases are fully labeled while other cases are only partially labeled, in the form of partial labels. Partial labels are represented as a set of possible labels for each training example, one of which is the correct label. We introduce a discriminative learning approach that incorporates partial label information into the conventional margin-based learning framework. The partial label learning problem is formulated as a convex quadratic optimization minimizing the L2-norm regularized empirical risk using hinge loss. We also present an efficient algorithm for classification in the presence of partial labels. Experiments with different data sets show that partial label information improves the performance of classification when there is traditional fully-labeled data, and also yields reasonable performance in the absence of any fully labeled data.
2005
In this paper we present an approach that trains Gaussian classifiers using labeled and unlabeled data. Training with unlabeled data introduces efficiency in terms of time and energy spent for labeling the data. We present experiments on different data sets to illustrate the effect of unlabeled data on the performance of the classifiers. We will try to show that under specific conditions unlabeled data contains valuable information for the target function. The algorithm we utilize, first trains the classifier with limited number of unlabeled data and then with the obtained classifier labels unlabeled data and re-trains the classifier with newly labeled data and proceeds the iteration.
Applied Optimization, 2001
We examine mathematical models for semi-supervised support vector machines (S 3 VM). Given a training set of labeled data and a working set of unlabeled data, S 3 VM constructs a support vector machine using both the training and working sets. We use S 3 VM to solve the transductive inference problem posed by Vapnik. In transduction, the task is to estimate the value of a classification function at the given points in the working set. This contrasts with inductive inference which estimates the classification function at all possible values. We propose a general S 3 VM model that minimizes both the misclassification error and the function capacity based on all the available data. Depending on how poorly-estimated unlabeled data are penalized, different mathematical models result. We examine several practical algorithms for solving these model. The first approach utilizes the S 3 VM model for 1-norm linear support vector machines converted to a mixedinteger program (MIP). A global solution of the MIP is found using a commerical integer programming solver. The second approach uses a noncovex quadratic program. Variations of block-coordinate-descent algorithms are used to find local solutions of this problem. Using this MIP within a local learning algorithm produced the best results. Our experimental study on these statistical learning methods indicates that incorporating working data can improve generalization.
We compare two recently proposed frameworks for combining generative and discriminative probabilistic classifiers and apply them to semi-supervised classification. In both cases we explore the tradeoff between maximizing a discriminative likelihood of labeled data and a generative likelihood of labeled and unlabeled data. While prominent semi-supervised learning methods assume low density regions between classes or are subject to generative modeling assumptions, we conjecture that hybrid generative/discriminative methods allow semi-supervised learning in the presence of strongly overlapping classes and reduce the risk of modeling structure in the unlabeled data that is irrelevant for the specific classification task of interest. We apply both hybrid approaches within naively structured Markov random field models and provide a thorough empirical comparison with two well-known semi-supervised learning methods on six text classification tasks. A semi-supervised hybrid generative/discriminative method provides the best accuracy in 75% of the experiments, and the multi-conditional learning hybrid approach achieves the highest overall mean accuracy across all tasks.
2021
Semi-supervised learning is the class of machine learning that deals with the use of supervised and unsupervised learning to implement the learning process. Conceptually placed between labelled and unlabeled data. In certain cases, it enables the large numbers of unlabeled data required to be utilized in comparison with usually limited collections of labeled data. In standard classification methods in machine learning, only a labeled collection is used to train the classifier. In addition, labelled instances are difficult to acquire since they necessitate the assistance of annotators, who serve in an occupation that is identified by their label. A complete audit without a supervisor is fairly easy to do, but nevertheless represents a significant risk to the enterprise, as there have been few chances to safely experiment with it so far. By utilizing a large number of unsupervised inputs along with the supervised inputs, the semisupervised learning solves this issue, to create a good ...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.