Academia.eduAcademia.edu

Mining Subspace Correlations

2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining

Abstract

In recent applications of clustering such as gene expression microarray analysis, collaborative filtering, and web mining, object similarity is no longer measured by physical distance, but rather by the behavior patterns objects manifest or the magnitude of correlations they induce. Current state of the art algorithms aiming at this type of clustering typically postulate specific cluster models that are able to capture only specific behavior patterns or correlations, and omit the possibility that other information carrying patterns or correlations may coexist in the data. We cast the problem of searching for pattern clusters or clusters that induce large correlations in some subset of features into the problem of searching for groups of points embedded in lines. The advantage of this approach is that is allows the clustering of different patterns or correlations simultaneously. It also allows the clustering of patterns and correlations that are overlooked by existing methods. A formal stochastic line cluster model is presented and its connection to correlation is established. Based on this model an algorithm, which uses feature selection to search for line clusters embedded in subspaces of the data is presented.