Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012
…
8 pages
1 file
This paper proposes some modest improvements to Extractor, a state-of-the-art keyphrase extraction system, by using a terabyte-sized corpus to estimate the informativeness and semantic similarity of keyphrases. We present two techniques to improve the organization and remove outliers of lists of keyphrases. The first is a simple ordering according to their occurrences in the corpus; the second is clustering according to semantic similarity. Evaluation issues are discussed. We present a novel technique of comparing extracted keyphrases to a gold standard which relies on semantic similarity rather than string matching or an evaluation involving human judges.
1999
Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate keyphrases using lexical methods, calculates feature values for each candidate, and uses a machine-learning algorithm to predict which candidates are good keyphrases. The machine learning scheme first builds a prediction model using training documents with known keyphrases, and then uses the model to find keyphrases in new documents. We use a large test corpus to evaluate Kea's effectiveness in terms of how many author-assigned keyphrases are correctly identified. The system is simple, robust, and publicly available.
International journal of interactive mobile technologies, 2022
The keyphrases of a document are the textual units that characterize its content such as the topics it addresses, its ideas, their field, etc. Thousands of books, articles and web pages are published every day. Manually extracting keyphrases is a tedious task and takes a lot of time. Automatic keyphrases extraction is an area of text mining that aims to identify the most useful and important phrases that give meaning to the content of a document. Keyphrases can be used in many Natural Language Processing (NLP) applications, such as text summarization, text clustering and text classification. This article provides a Systematic Literature Review (SLR) to investigate, analyze, and discuss existing relevant contributions and efforts that use new concepts and tools to improve keyphrase extraction. We have studied the supervised and unsupervised approaches to extracting keyphrases published in the period 2015-2022. We have also identified the steps most commonly used by the different approaches. Additionally, we looked at the criteria that should be evaluated to improve the accuracy of keyphrases extraction. Each selected approach was evaluated for its ability to extract keyphrases. Our findings highlight the importance of keyphrase extraction, and provide researchers and practitioners with information about proposed solutions and their limitations, which contributes to extract keyphrases in a powerful and meaningful way effective. Keywords-keyphrases extraction, systematic literature review, text mining, natural language processing 2.1 Applications Automatic keyphrase extraction is used in many domains dealing with textual data, such as text classification [5], document clustering [6], document summarization [7], and search engines [8]. Although some studies have attempted to limit these domains like [9], which limited their use to five domains, due to importance of the information provided by the keyphrases, the AKE can also be exploited in many other domains such as recommender systems [10], web mining [11], bibliometric analysis [12], and sentiment analysis [13]. 2.2 Keyphrases extraction process The keyphrase extraction process goes through a set of steps. Merrouni et al. in [3] defined it, in five main steps as shown in Figure 1, where the text goes through the 32 http://www.i-jim.org Paper-A Systematic Literature Review of Keyphrases Extraction Approaches preprocessing step, which aims to remove unnecessary textual units. In order to eliminate the noise in the basic text. Many techniques are used, such as tokenization, stop word removal, stemming, and normalization. According to, [14] and [15], candidate keyphrases are terms that do not contain punctuation or stop words and have morphosyntactic structures "adjective* noun+", for example, ("Big data", "Computer engineering", etc.). Many techniques used to select candidate keyphrases, such as Part-Of-Speech, N-grams [16], and Noun-Phrase-Chunks [17]. Paper-A Systematic Literature Review of Keyphrases Extraction Approaches Electronic Data Sources. In this study, we used a strategy, based on multiple electronic data sources (EDS), to collect related work. We conducted an online search by five electronic data sources (see Table 2). These EDS include all the journals and conference proceedings of high-quality to automatic keyphrases extraction approaches. We also applied a snowball search strategy by a bibliographic analysis of the selected articles to find more related articles.
2003
To tackle the issue of information overload, we present an Information Gain-based KeyPhrase Extraction System, called KPSpotter. KPSpotter is a flexible web-enabled keyphrase extraction system, capable of processing various formats of input data, including web data, and generating the extraction model as well as the list of keyphrases in XML. In KPSpotter, the following two features were selected for training and extracting keyphrases: 1) TF*IDF and 2) Distance from First Occurrence. Input training and testing collections were processed in three stages: 1) Data Cleaning, 2) Data Tokenizing, and 3) Data Discretizing. To measure the system performance, the keyphrases extracted by KPSpotter are compared with the ones that the authors assigned. Our experiments show that the performance of KPSpotter was evaluated to be equivalent to KEA, a well-known keyphrase extraction system. KPSpotter, however, is differentiated from other extraction systems in the followings: First, KPSpotter employ...
Keyphrases, synonymously spoken as keywords, represent semantic metadata and play an important role to capture the main theme represented by a large text data collection. Although authors provide a list of about five to ten keywords in scientific publications that are used to map them to respective domains, due to exponential growth of non-scientific documents either on the World Wide Web or in textual databases, an automatic mechanism is sought to identify keyphrases embedded within them. In this paper, we propose the design of a lightweight machine learning approach to identify feasible keyphrases in text documents. The proposed method mines various lexical and semantic features from texts to learn a classification model. The efficacy of the proposed system is established through experimentation on datasets from three different domains.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014
While automatic keyphrase extraction has been examined extensively, state-of-theart performance on this task is still much lower than that on many core natural language processing tasks. We present a survey of the state of the art in automatic keyphrase extraction, examining the major sources of errors made by existing systems and discussing the challenges ahead.
International Journal of Intelligent Systems and Applications, 2017
Keyphrases are set of words that reflect the main topic of interest of a document. It plays vital roles in document summarization, text mining, and retrieval of web contents. As it is closely related to a document, it reflects the contents of the document and acts as indices for a given document. Extracting the ideal keyphrases is important to understand the main contents of the document. In this work, we present a keyphrase extraction method that efficiently finds the keywords from English documents. The methods use some important features of the document such as TF, TF*IDF, GF, GF*IDF, TF*GF*IDF for the purpose. Finally, the performance of the proposal is evaluated using wellknown document corpus.
Journal of Advances in Information Technology
The graph-based approach has proven to be the most effective method of extracting keyphrases. Existing graph-based extraction methods do not include nouns as a component, resulting in keyphrases that are not noun-centric, leading to low-quality keyphrases. Also, the clustering approach employed in most of the keyphrase extraction has not yielded good results. This study proposed an improved model for extracting keyphrases that uses a graph-based model with noun phrase identifiers and effective clustering techniques. Relevant data was collected from selected documents in the English language. A graph-based model was formulated by integrating the textrank algorithm for node ranking, a noun phrase identifier for noun phrase scoring, an affinity propagation algorithm for selecting cluster groups, and k-means for clustering. The formulated model was implemented and evaluated by benchmarking it with an existing model using recall, f-measure, and precision as performance metrics. Final results showed that the developed model has a higher precision of 5.5%, a recall of 5.3%, and an f-measure score of 5.5% over the existing model. This implied that the noun-centric keyphrase extraction ensured high-quality keyphrase extraction.
2012
Abstract: Keyphrases are added to documents to help identify the areas of interest they contain. However, in a significant proportion of papers author selected keyphrases are not appropriate for the document they accompany: for instance, they can be classificatory rather than explanatory, or they are not updated when the focus of the paper changes. As such, automated methods for improving the use of keyphrases are needed, and various methods have been published.
Traditionally, keyphrases (or keywords) have been manually assigned to documents by their authors or by human indexers. This, however, has become impractical due to the massive growth of documents—particularly short articles (e.g. microblogs, abstracts, snippets)—on the Internet each day, thus creating a need for systems that automatically extract keyphrases from documents. Automatic keyphrase extraction methods have generally taken either supervised or unsupervised approaches. Supervised methods extract keyphrases by using a training document set, thus acquiring knowledge from a global collection of texts. Conversely, unsupervised methods extract keyphrases by determining their relevance in a single-document context, without prior learning. We present a hybrid keyphrase extraction method for short articles, HybridRank, which leverages the benefits of both approaches. Our system implements modified versions of the TextRank (Mihalcea and Tarau, 2004)—unsupervised—and KEA (Witten et a...
2004
Designing and developing a system that assists the users in digesting and understanding information available has been a difficult challenge. In this paper, we discuss the design and development of an automatic interactive keyphrase extraction system, called KPSpotter, which is capable of processing various formats of data such as XML, HTML, and plain text through Internet. KPSpotter combines Information Gain data mining measure and several Natural Language Processing (NLP) techniques, such as Part of Speech (POS) technique and First Occurrence of Term. To improve extraction accuracy, WordNet is incorporated into KPSpotter. In designing and developing KPSpotter we utilized Unified Modeling Language (UML). UML modeling helps in the formalization of the preliminary analysis model and accomplishes iterative system design and development. We also conducted experiments for system performance testing by comparing keyphrases extracted by KPSPotter and KEA, a well-known naive Baysiean-based keyphrase extraction system. The experiments show that KPSpotter outperforms KEA in most test cases.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Journal of Ambient Intelligence and Humanized Computing, 2012
World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 2012
… of the 5th International Workshop on …, 2010
Journal of Information Science, 2012
International Conference on Computational Linguistics, 2010
International Journal of Computer Applications, 2015
Journal of the Association for Information Science and Technology, 2015
Computación y Sistemas
Cognitive Computation
Meeting of the Association for Computational Linguistics, 2010
Multimodal Technologies and Interaction
Proceedings of the 28th International Conference on Computational Linguistics, 2020
Procedia Computer Science
Advances in Intelligent and Soft Computing, 2011