Keyword Extraction Performance Analysis

abhishek kumbhar

Keyword Extraction Performance Analysis

abhishek kumbhar

2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

This paper presents a survey-cum-evaluation of methods for the comprehensive comparison of the task of keyword extraction using datasets of various sizes, forms, and genre. We use four different datasets which includes Amazon product data-Automotive, SemEval 2010, TMDB and Stack Exchange. Moreover, a subset of 100 Amazon product reviews is annotated and utilized for evaluation in this paper, to our knowledge, for the first time. Datasets are evaluated by five Natural Language Processing approaches (3 unsupervised and 2 supervised), which include TF-IDF, RAKE, TextRank, LDA and Shallow Neural Network. We use a tenfold cross-validation scheme and evaluate the performance of the aforementioned approaches using recall, precision and F-score. Our analysis and results provide guidelines on the proper approaches to use for different types of datasets. Furthermore, our results indicate that certain approaches achieve improved performance with certain datasets due to inherent characteristics of the data.

Tadashi Nomoto

SN Computer Science

The goal of keyword extraction is to extract from a text, words, or phrases indicative of what it is talking about. In this work, we look at keyword extraction from a number of different perspectives: Statistics, Automatic Term Indexing, Information Retrieval (IR), Natural Language Processing (NLP), and the emerging Neural paradigm. The 1990s have seen some early attempts to tackle the issue primarily based on text statistics [13, 17]. Meanwhile, in IR, efforts were largely led by DARPA’s Topic Detection and Tracking (TDT) project [2]. In this contribution, we discuss how past innovations paved a way for more recent developments, such as LDA, PageRank, and Neural Networks. We walk through the history of keyword extraction over the last 50 years, noting differences and similarities among methods that emerged during the time. We conduct a large meta-analysis of the past literature using datasets from news media, science, and medicine to business and bureaucracy, to draw a general pict...

Log In

Keyword Extraction Performance Analysis

Sign up for access to the world's latest research

Abstract

Related papers

Related topics