Academia.eduAcademia.edu

Reducing Annotation Effort using Generalized Expectation Criteria

Abstract

Generalized expectation (GE) criteria are terms in objective functions that assign scores to values of model expectations. In this paper we introduce GE-FL, a method that uses GE to train a probabilistic model using associations between input features and classes rather than complete labeled instances. Specifically, here the expectations are model predicted class distributions on unlabeled instances that contain selected input features. The score function is the KL divergence from reference distributions estimated using feature-class associations. We show that a multinomial logistic regression model trained with GE-FL outperforms several baseline methods that use feature-class associations. Next, we compare with a method that incorporates feature-class associations into Boosting and find that it requires 400 labeled instances to attain the same accuracy as GE-FL, which uses no labeled instances. In human annotation experiments, we show that labeling features is on average 3.7 times faster than labeling documents, a result that supports similar findings in previous work . Additionally, using GE-FL provides a 1.0% absolute improvement in final accuracy over semi-supervised training with labeled documents. The accuracy difference is often much more pronounced with only a few minutes of annotation, where we see absolute accuracy improvements as high as 40%.