Semantic based Document Clustering: A Detailed Review

Neepa Shah; Sunita Mahajan

Semantic based Document Clustering: A Detailed Review

neepa shah

2012, International Journal of Computer Applications

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Document clustering, one of the traditional data mining techniques, is an unsupervised learning paradigm where clustering methods try to identify inherent groupings of the text documents, so that a set of clusters is produced in which clusters exhibit high intra-cluster similarity and low intercluster similarity. The importance of document clustering emerges from the massive volumes of textual documents created. Although numerous document clustering methods have been extensively studied in these years, there still exist several challenges for increasing the clustering quality. Particularly, most of the current document clustering algorithms does not consider the semantic relationships which produce unsatisfactory clustering results. Since last three-four years efforts have been seen in applying semantics to document clustering. Here, an exhaustive and detailed review of more than thirty semantic driven document clustering methods is presented. After an introduction to the document clustering and its basic requirements for improvement, traditional algorithms are overviewed. Also, semantic similarity measures are explained. The article then discusses algorithms that make semantic interpretation of documents for clustering. The semantic approach applied, datasets used, evaluation parameters applied, limitations and future work of all these approaches is presented in tabular format for easy and quick interpretation.

Related papers

Review on Semantic Document Clustering

Wael Yafooz, SK Ahammad Fahad

Now the age of information technology, the textual document is spontaneously increasing over online or offline. In those articles contain Product information to a company profile. A lot of sources generate valuable information into text in the medical report, economic analysis, scientific journals, news, blog etc. Maintain and access those documents are very difficult without proper classification. Those problems can be overcome by proper document classification. Only a few documents are classified. All need classification and those are unsupervised. In this context clustering is the only solution. Traditional clustering technique and textual clustering have some difference. Relations between words are very imported to do clustering. Semantic clustering is proven as more appropriate clustering technique for texts. In this review paper, there has valuable information about clustering to semantic document clustering technique. In this paper, there has some information provided about advantage and disadvantage for various clustering methods.

Log In

Semantic based Document Clustering: A Detailed Review

Sign up for access to the world's latest research

Abstract

Related papers