Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009
In many image and video collections, we have access only to partially labeled data. For example, personal photo collections often contain several faces per image and a caption that only specifies who is in the picture, but not which name matches which face. Similarly, movie screenplays can tell us who is in the scene, but not when and where they are on the screen. We formulate the learning problem in this setting as partially-supervised multiclass classification where each instance is labeled ambiguously with more than one label. We show theoretically that effective learning is possible under reasonable assumptions even when all the data is weakly labeled. Motivated by the analysis, we propose a general convex learning formulation based on minimization of a surrogate loss appropriate for the ambiguous label setting. We apply our framework to identifying faces culled from web news sources and to naming characters in TV series and movies. We experiment on a very large dataset consisting of 100 hours of video, and in particular achieve 6% error for character naming on 16 episodes of LOST.
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008
In this paper, we address the problem of learning when some cases are fully labeled while other cases are only partially labeled, in the form of partial labels. Partial labels are represented as a set of possible labels for each training example, one of which is the correct label. We introduce a discriminative learning approach that incorporates partial label information into the conventional margin-based learning framework. The partial label learning problem is formulated as a convex quadratic optimization minimizing the L2-norm regularized empirical risk using hinge loss. We also present an efficient algorithm for classification in the presence of partial labels. Experiments with different data sets show that partial label information improves the performance of classification when there is traditional fully-labeled data, and also yields reasonable performance in the absence of any fully labeled data.
CVPR 2011, 2011
We consider a special type of multi-label learning where class assignments of training examples are incomplete. As an example, an instance whose true class assignment is (c 1 , c 2 , c 3) is only assigned to class c 1 when it is used as a training sample. We refer to this problem as multi-label learning with incomplete class assignment. Incompletely labeled data is frequently encountered when the number of classes is very large (hundreds as in MIR Flickr dataset) or when there is a large ambiguity between classes (e.g., jet vs plane). In both cases, it is difficult for users to provide complete class assignments for objects. We propose a ranking based multi-label learning framework that explicitly addresses the challenge of learning from incompletely labeled data by exploiting the group lasso technique to combine the ranking errors. We present a learning algorithm that is empirically shown to be efficient for solving the related optimization problem. Our empirical study shows that the proposed framework is more effective than the state-ofthe-art algorithms for multi-label learning in dealing with incompletely labeled data.
ArXiv, 2021
Large-scale multi-label classification datasets are commonly, and perhaps inevitably, partially annotated. That is, only a small subset of labels are annotated per sample. Different methods for handling the missing labels induce different properties on the model and impact its accuracy. In this work, we analyze the partial labeling problem, then propose a solution based on two key ideas. First, un-annotated labels should be treated selectively according to two probability quantities: the class distribution in the overall dataset and the specific label likelihood for a given data sample. We propose to estimate the class distribution using a dedicated temporary model, and we show its improved efficiency over a naı̈ve estimation computed using the dataset’s partial annotations. Second, during the training of the target model, we emphasize the contribution of annotated labels over originally un-annotated labels by using a dedicated asymmetric loss. With our novel approach, we achieve st...
Figure 1: Face recognition in videos: Often valuable information cannot be unambiguously assigned to exactly one person. Further not all information is completely reliable. that could be valuable for interpretation of its content. This especially applies for the recognition of faces within video streams, where often cues such as transcripts and subtitles are available. However, this data is not completely reliable and might be ambiguously labeled. To overcome these limitations, we propose a new semi supervised multiple instance learning ...
ArXiv, 2021
Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that datasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambigu...
IEEE Transactions on Neural Networks and Learning Systems
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007
We develop an approach to reduce correspondence ambiguity in training data where data items are associated with sets of plausible labels. Our domain is images annotated with keywords where it is not known which part of the image a keyword refers to. In contrast to earlier approaches that build predictive models or classifiers despite the ambiguity, we argue that that it is better to first address the correspondence ambiguity, and then build more complex models from the improved training data. This addresses difficulties of fitting complex models in the face of ambiguity while exploiting all the constraints available from the training data. We contribute a simple and flexible formulation of the problem, and show results validated by a recently developed comprehensive evaluation data set and corresponding evaluation methodology.
Procedings of the British Machine Vision Conference 2011, 2011
We present a large scale database of images and captions, designed for supporting research on how to use captioned images from the Web for training visual classifiers. It consists of more than 125,000 images of celebrities from different fields downloaded from the Web. Each image is associated to its original text caption, extracted from the html page the image comes from. We coin it FAN-Large, for Face And Names Large scale database. Its size and deliberate high level of noise makes it to our knowledge the largest and most realistic database supporting this type of research. The dataset and its annotations are publicly available and can be obtained from http://www.vision. ee.ethz.ch/~calvin/fanlarge/. We report results on a thorough assessment of FAN-Large using several existing approaches for name-face association, and present and evaluate new contextual features derived from the caption. Our findings provide important cues on the strengths and limitations of existing approaches.
Automatic video tagging systems are targeted at assigning semantic concepts ("tags") to videos by linking textual descriptions with the audio-visual video content. To train such systems, we investigate online video from portals such as YouTube TM as a large-scale, freely available knowledge source. Tags provided by video owners serve as weak annotations indicating that a target concept appears in a video, but not when it appears. This situation resembles the multiple instance learning (MIL) scenario, in which classifiers are trained on labeled bags (videos) of unlabeled samples (the frames of a video). We study MIL in quantitative experiments on real-world online videos. Our key findings are: (1) conventional MIL tends to neglect valuable information in the training data and thus performs poorly. (2) By relaxing the MIL assumption, a tagging system can be built that performs comparable or better than its supervised counterpart. (3) Improvements by MIL are minor compared to a kernel-based model we proposed recently .
Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), 2003
Deployed vision systems often encounter image variations poorly represented in their training data. While observing their environment, such vision systems obtain unlabeled data that could be used to compensate for incomplete training. In order to exploit these relatively cheap and abundant unlabeled data we present a family of algorithms called λMEEM. Using these algorithms, we train an appearance-based people detection model. In contrast to approaches that rely on a large number of manually labeled training points, we use a partially labeled data set to capture appearance variation. One can both avoid the tedium of additional manual labeling and obtain improved detection performance by augmenting a labeled training set with unlabeled data. Further, enlarging the original training set with new unlabeled points enables the update of detection models after deployment without human intervention. To support these claims we show people detection results, and compare our performance to a purely generative Expectation Maximization-based approach to learning over partially labeled data.
In this paper, we introduce the imprecise label learning (ILL) framework, a unified approach to handle various imprecise label configurations, which are commonplace challenges in machine learning tasks. ILL leverages an expectation-maximization (EM) algorithm for the maximum likelihood estimation (MLE) of the imprecise label information, treating the precise labels as latent variables. Compared to previous versatile methods attempting to infer correct labels from the imprecise label information, our ILL framework considers all possible labeling imposed by the imprecise label information, allowing a unified solution to deal with any imprecise labels. With comprehensive experimental results, we demonstrate that ILL can seamlessly adapt to various situations, including partial label learning, semi-supervised learning, noisy label learning, and a mixture of these settings. Notably, our simple method surpasses the existing techniques for handling imprecise labels, marking the first unified framework with robust and effective performance across various imprecise labels. We believe that our approach has the potential to significantly enhance the performance of machine learning models on tasks where obtaining precise labels is expensive and complicated. We hope our work will inspire further research on this topic with an open-source codebase release.
2010
In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.
Proceedings of the 13th annual ACM international conference on Multimedia - MULTIMEDIA '05, 2005
Labeling faces in news video with their names is an interesting research problem which was previously solved using supervised methods that demand significant user efforts on labeling training data. In this paper, we investigate a more challenging setting of the problem where there is no complete information on data labels. Specifically, by exploiting the uniqueness of a face's name, we formulate the problem as a special multi-instance learning (MIL) problem, namely exclusive MIL or eMIL problem, so that it can be tackled by a model trained with partial labeling information as the anonymity judgment of faces, which requires less user effort to collect. We propose two discriminative probabilistic learning methods named Exclusive Density (ED) and Iterative ED for eMIL problems. Experiments on the face labeling problem shows that the performance of the proposed approaches are superior to the traditional MIL algorithms and close to the performance achieved by supervised methods trained with complete data labels.
Pattern Recognition, 2004
In classic pattern recognition problems, classes are mutually exclusive by deÿnition. Classiÿcation errors occur when the classes overlap in the feature space. We examine a di erent situation, occurring when the classes are, by deÿnition, not mutually exclusive. Such problems arise in semantic scene and document classiÿcation and in medical diagnosis. We present a framework to handle such problems and apply it to the problem of semantic scene classiÿcation, where a natural scene may contain multiple objects such that the scene can be described by multiple class labels (e.g., a ÿeld scene with a mountain in the background). Such a problem poses challenges to the classic pattern recognition paradigm and demands a di erent treatment. We discuss approaches for training and testing in this scenario and introduce new metrics for evaluating individual examples, class recall and precision, and overall accuracy. Experiments show that our methods are suitable for scene classiÿcation; furthermore, our work appears to generalize to other classiÿcation problems of the same nature.
Advanced Video and …, 2011
Videos are often associated with additional information that could be valuable for interpretation of their content. This especially applies for the recognition of faces within video streams, where often cues such as transcripts and subtitles are available. However, this data is not completely reliable and might be ambiguously labeled. To overcome these limitations, we take advantage of semi-supervised (SSL) and multiple instance learning (MIL) and propose a new semi-supervised multiple instance learning (SSMIL) algorithm. Thus, during training we can weaken the prerequisite of knowing the label for each instance and can integrate unlabeled data, given only probabilistic information in form of priors. The benefits of the approach are demonstrated for face recognition in videos on a publicly available benchmark dataset. In fact, we show exploring new information sources can considerably improve the classification results.
Procedings of the British Machine Vision Conference 2011, 2011
We propose an approach for improving unconstrained face recognition based on leveraging weakly labeled web videos. It is easy to obtain videos that are likely to contain a face of interest from sites such as YouTube through issuing queries with a person's name; however, many examples of faces not belonging to the person of interest will be present. We propose a new technique capable of learning using weakly or noisly labeled faces obtained in this setting. In particular, we present a novel method for semi-supervised learning using noisy labels which incorporates a margin or null category like property within a fully probabilistic framework. We outline general properties of the approach, showing how the choice of an exponential hyperprior results in an L1 penality which leads to sparse models capable of explicitly accounting for label uncertainty producing state of the art performance. We then illustrate how the margin approach provides robustness and significant performance gains when faces within YouTube search results are combined with the unconstrained face images from the Labeled Faces in the wild dataset.
Convolutional neural networks (CNNs) have shown great performance as general feature representations for object recognition applications. However, for multi-label images that contain multiple objects from different categories, scales and locations, global CNN features are not optimal. In this paper, we incorporate local information to enhance the feature discriminative power. In particular, we first extract object proposals from each image. With each image treated as a bag and object proposals extracted from it treated as instances, we transform the multi-label recognition problem into a multi-class multi-instance learning problem. Then, in addition to extracting the typical CNN feature representation from each proposal, we propose to make use of ground-truth bounding box annotations (strong labels) to add another level of local information by using nearest-neighbor relationships of local regions to form a multi-view pipeline. The proposed multi-view multiinstance framework utilizes both weak and strong labels effectively, and more importantly it has the generalization ability to even boost the performance of unseen categories by partial strong labels from other categories. Our framework is extensively compared with state-of-the-art handcrafted feature based methods and CNN based methods on two multi-label benchmark datasets. The experimental results validate the discriminative power and the generalization ability of the proposed framework. With strong labels, our framework is able to achieve state-of-the-art results in both datasets.
International Conference on Multimedia Computing and Systems/International Conference on Multimedia and Expo, 2010
Labeling persons appearing in video frames with names detected from the video transcript helps improving the video content identification and search task. We develop a face naming method that learns from labeled and unlabeled examples using iterative label propagation in a graph of connected faces or name-face pairs. The advantage of this method is that it can use very few labeled data points and incorporate the unlabeled data points during the learning process. Anchor detection and metric learning for face classification techniques are incorporated into the label propagation process to help boosting the face naming performance. On BBC News videos, the label propagation algorithm yields better results than a Support Vector Machine classifier trained on the same labeled data.
Signal, Image and Video Processing, 2017
The objective of this work is to correctly detect and recognize faces in an image collection using a database of known faces. This has applications in photo-tagging, video indexing, surveillance and recognition in wearable computers. We propose a two-stage approach for both detection and recognition tasks. In the first stage, we generate a seed set from the given image collection using off-the-shelf face detection and recognition algorithms. In the second stage, the obtained seed set is used to improve the performance of these algorithms by adapting them to the domain at hand. We propose an exemplar-based semi-supervised framework for improving the detections. For recognition of images, we use sparse representation classifier and generate seed images based on a confidence measure. The labels of the seed set are then propagated to other faces using label propagation framework by imposing appropriate constraints. Unlike traditional approaches, our approach exploits the similarities among the faces in collection to obtain improved performance. We conduct extensive experiments on two real-world photo-album and video collections. Our approach consistently provides an improvement of ∼4% for detection and 5−9% for recognition on all these datasets.
Proceedings of the 2014 SIAM International Conference on Data Mining, 2014
Multi-label learning deals with the classification problems where each instance can be assigned with multiple labels simultaneously. Conventional multi-label learning approaches mainly focus on exploiting label correlations. It is usually assumed, explicitly or implicitly, that the label sets for training instances are fully labeled without any missing labels. However, in many real-world multi-label datasets, the label assignments for training instances can be incomplete. Some groundtruth labels can be missed by the labeler from the label set. This problem is especially typical when the number instances is very large, and the labeling cost is very high, which makes it almost impossible to get a fully labeled training set. In this paper, we study the problem of large-scale multi-label learning with incomplete label assignments. We propose an approach, called Mpu, based upon positive and unlabeled stochastic gradient descent and stacked models. Unlike prior works, our method can effectively and efficiently consider missing labels and label correlations simultaneously, and is very scalable, that has linear time complexities over the size of the data. Extensive experiments on two real-world multi-label datasets show that our Mpu model consistently outperform other commonly-used baselines.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.