Academia.eduAcademia.edu

HTPI: Hadoop Text Processing Interface

2014, International Journal for Scientific Research and Development

Abstract

Text mining is a practice which is regarded as the supporting pillars of Information Retreival. This paper is in simple terms dedicated to text mining and bear a prime focus on mining academic papers. An architecture is proposed by the authors is presented in the paper, which they have named HTPI. This framework is built upon Java eclipse using Apache Hadoop. The problem under consideration for the paper is the reference metamorphosis of the references mentioned in the references section of any scientific paper based upon the similarity score(between the referenced paper and the paper whose reference list is being re-ordered) retrieved. Various notions have been used in the paper like stemming, skipping and similarity calculation using Jaccard Coefficient.