Academia.eduAcademia.edu

MMR-based feature selection for text categorization

2004

Abstract

We introduce a new method of feature selection for text categorization. Our MMR-based feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results show that MMR-based feature selection is more effective than Koller & Sahami's method, which is one of greedy feature selection methods, and conventional information gain which is commonly used in feature selection for text categorization. Moreover, MMRbased feature selection sometimes produces some improvements of conventional machine learning algorithms over SVM which is known to give the best classification accuracy. [12] William S. Cooper. 1991. Some Inconsistencies and Misnomers in Probabilistic Information Retrieval. In Proceedings of the 14th ACM SIGIR International Conference on Research and Development in Information Retrieval. [13] Mehran Sahami. 1998. Using Machine Learning to Improve Information Access. PhD thesis, Stanford University.