Domain kernels for text categorization

Alfio Gliozzo; Carlo Strapparava

Domain kernels for text categorization

Alfio Gliozzo

2005, Proceedings of the Ninth Conference on Computational Natural Language Learning - CONLL '05

visibility

…

description

8 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

In this paper we propose and evaluate a technique to perform semi-supervised learning for Text Categorization.

Carlos Manuel Hernandez Gonzalez

Principles of Data Mining and Knowledge Discovery, 2000

Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, for many text classification tasks, providing labeled training documents is expensive, while unlabeled documents are readily available in large quantities. Learning from both, labeled and unlabeled documents, in a semisupervised framework is a promising approach to reduce the need for labeled training documents. This paper compares three commonly applied text classifiers in the light of semi-supervised learning, namely a linear support vector machine, a similarity-based tfidf and a Naïve Bayes classifier. Results on a real-world text datasets show that these learners may substantially benefit from using a large amount of unlabeled documents in addition to some labeled documents.

Log In

Domain kernels for text categorization

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics