Clustering by pattern similarity in large data sets

Wei Wang

Clustering by pattern similarity in large data sets

Wei Wang

2002

visibility

…

description

12 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Clustering is the process of grouping a set of objects into classes of similar objects. Although definitions of similarity vary from one clustering model to another, in most of these models the concept of similarity is based on distances, e.g., Euclidean distance or cosine distance. In other words, similar objects are required to have close values on at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, which captures not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. Our paper introduces an effective algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its effectiveness.

Grant E Daggard

2005

In this paper we propose a clustering algorithm called s-Cluster for analysis of gene expression data based on pattern-similarity. The algorithm captures the tight clusters exhibiting strong similar expression patterns in Microarray data,and allows a high level of overlap among discovered clusters without completely grouping all genes like other algorithms. This reflects the biological fact that not all functions are turned on in an experiment, and that many genes are co-expressed in multiple groups in response to different stimuli. The experiments have demonstrated that the proposed algorithm successfully groups the genes with strong similar expression patterns and that the found clusters are interpretable.

Log In

Clustering by pattern similarity in large data sets

Sign up for access to the world's latest research

Abstract

Related papers

Related papers

Related topics