Academia.eduAcademia.edu

A general framework for XML Document Clustering

2003

Abstract

A novel methodology for clustering XML documents is discussed. The underlying idea is grouping documents which exhibit structural similarities. To this purpose, a suitable technique for identifying meaningful matchings among the nodes of two XML document trees is investigated. The proposed technique also allows to associate to each set of related documents a single prototype XML document, i.e. a representative subsuming the most relevant features of the documents in the set. Suitable techniques for both building and refining cluster-specific representatives are analyzed. Some initial experimental results show the effectiveness of our approach.