An Empirical Comparison of Text Categorization Methods

Cardoso-Cachopo, Ana; Oliveira, Arlindo L.

An Empirical Comparison of Text Categorization Methods

Ana Cardoso-Cachopo

visibility

…

description

14 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models. We report the results obtained using the Mean Reciprocal Rank as a measure of overall performance, a commonly used evaluation measure for question answering tasks. We argue that this evaluation measure is also very well suited for text categorization tasks. Our results show that overall, SVMs and k-NN LSA perform better than the other methods, in a statistically significant way.

Andrew McCallum

This paper explores the use of Support Vector Machines SVMs for learning text classi ers from examples. It analyzes the particular properties of learning with text data and identi es why SVMs are appropriate for this task. Empirical results support the theoretical ndings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly ove r a v ariety of di erent learning tasks. Furthermore, they are fully automatic, eliminating the need for manual parameter tuning.

Log In

An Empirical Comparison of Text Categorization Methods

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics