Clustering Techniques on Text Mining: A Review

Innovative Research Publications

Clustering Techniques on Text Mining: A Review

Innovative Research Publications

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

—Rapid advancements of smart technologies, permits the individuals and organizations to store large number of documents in repositories. But it is quite difficult to retrieve the relevant documents from these massive collections. Document clustering is the process of organizing such massive document collections into meaningful clusters. It is simple and less tedious to find relevant documents, if documents are clustered on the basis of topic or category. There are various document clustering algorithms available for effectively organizing the documents such that a document is close to its related documents. This paper presents various clustering techniques that are being used in text mining.

International Journal of Electrical and Computer Engineering (IJECE)

International Journal of Electrical and Computer Engineering (IJECE), 2021

Increasing progress in numerous research fields and information technologies, led to an increase in the publication of research papers. Therefore, researchers take a lot of time to find interesting research papers that are close to their field of specialization. Consequently, in this paper we have proposed documents classification approach that can cluster the text documents of research papers into the meaningful categories in which contain a similar scientific field. Our presented approach based on essential focus and scopes of the target categories, where each of these categories includes many topics. Accordingly, we extract word tokens from these topics that relate to a specific category, separately. The frequency of word tokens in documents impacts on weight of document that calculated by using a numerical statistic of term frequency-inverse document frequency (TF-IDF). The proposed approach uses title, abstract, and keywords of the paper, in addition to the categories topics to perform the classification process. Subsequently, documents are classified and clustered into the primary categories based on the highest measure of cosine similarity between category weight and documents weights.

Log In

Clustering Techniques on Text Mining: A Review

Sign up for access to the world's latest research

Abstract

Related papers

Related papers