Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2002
…
12 pages
1 file
Clustering is the process of grouping a set of objects into classes of similar objects. Although definitions of similarity vary from one clustering model to another, in most of these models the concept of similarity is based on distances, e.g., Euclidean distance or cosine distance. In other words, similar objects are required to have close values on at least a set of dimensions. In this paper, we explore a more general type of similarity. Under the pCluster model we proposed, two objects are similar if they exhibit a coherent pattern on a subset of dimensions. For instance, in DNA microarray analysis, the expression levels of two genes may rise and fall synchronously in response to a set of environmental stimuli. Although the magnitude of their expression levels may not be close, the patterns they exhibit can be very much alike. Discovery of such clusters of genes is essential in revealing significant connections in gene regulatory networks. E-commerce applications, such as collaborative filtering, can also benefit from the new model, which captures not only the closeness of values of certain leading indicators but also the closeness of (purchasing, browsing, etc.) patterns exhibited by the customers. Our paper introduces an effective algorithm to detect such clusters, and we perform tests on several real and synthetic data sets to show its effectiveness.
2005
In this paper we propose a clustering algorithm called s-Cluster for analysis of gene expression data based on pattern-similarity. The algorithm captures the tight clusters exhibiting strong similar expression patterns in Microarray data,and allows a high level of overlap among discovered clusters without completely grouping all genes like other algorithms. This reflects the biological fact that not all functions are turned on in an experiment, and that many genes are co-expressed in multiple groups in response to different stimuli. The experiments have demonstrated that the proposed algorithm successfully groups the genes with strong similar expression patterns and that the found clusters are interpretable.
IJRCAR, 2014
For the identification of classes of same characteristics or similar objects among a set of objects, clustering can be used effectively. The definition of similarity can be different in one clustering model to another. The concept of similarity is often based on metrics as Manhattan distance, Euclidean distance, Pearson correlation coefficient or any other measures depending on the model which is used for clustering. Similar objects must have values which are close in at least any set of the dimensions. Clustering is a used as an unsupervised data analysis approach in machine learning in the field of data mining .Computation of pairwise distance in advance becomes a common requirement amongst many existing clustering methods. This makes it computationally expensive and difficult to manage with huge data sets used as in the field of bioinformatics. In the pattern similarity cluster model two objects can be told as similar if they show a pattern which is robust on a subset of the existing dimensions. The new similarity concept models a large variety of applications like as in the field of bioinformatics. As in DNA microarray analysis, the expression levels of two or more genes may increase and decrease synchronously according to the responses from the environmental incentives. The magnitude of their expression levels may not be close, but the patterns they exhibit can be more over same. Discovery of such clusters of genes is important in revealing significant information about gene regulatory networks. There are other applications that can also benefit from the new model, because it is able to capture not only the closeness of values but also the closeness of patterns showed by the any object present. Clustering methods have been applied to gene expression data sets in order to group genes sharing common or similar expression profiles into separate efficient groups. In such analyses, designing an appropriate (dis)similarity measure is critical. It is expected to be especially efficient when the shape of expression profile is vital in determining the gene relationship, yet the expression magnitude should also be taken into account for to some extent
International Journal of Computer Mathematics, 2007
Identification of co-expressed genes sharing similar biological behaviors is an essential step in functional genomics. Traditional clustering techniques are generally based on overall similarity of expression levels and often generate clusters with mixed profile patterns. This paper proposes a novel pattern recognition method for selecting co-expressed genes based on rate of change and modulation status of gene expression at each time interval. The proposed method is capable of identifying gene clusters consisting of highly similar shapes of expression profiles and modulation patterns. Furthermore, we develop a quality index based on the semantic similarity in gene annotations to assess the likelihood of a cluster being a co-regulated group. The effectiveness of the proposed methodology is demonstrated through its application to the well-known yeast sporulation dataset and an in-house cancer genomics dataset.
International Journal of Bioinformatics Research, 2011
DNA microarray technology is a fundamental tool in gene expression data analysis. The collection of datasets from the technology has underscored the need for quantitative analytical tools to examine such data. Due to the large number of genes and complex gene regulation networks, clustering is a useful exploratory technique for analyzing these data. Many clustering algorithms have been proposed to analyze microarray gene expression data, but very few of them evaluate the quality of the clusters. In this paper, a novel cluster analysis technique has been proposed without considering number of clusters a priori. The method computes a similarity measurement function based on which the clusters are merged and subsequently splits a cluster by computing the degree of separation of the cluster. The process of splitting and merging performs iteratively until the cluster validity index (i.e. DB index) degrades. The experimental result shows that the proposed cluster analysis technique gives comparable results on gene cancer dataset with existing methods. This study may help raise relevant issues in the extraction of meaningful biological information from microarray expression data.
IJRCAR, 2014
Microarrays enable biologists to study genome-wide patterns of gene expression in any given cell type at any given time and under any given set of conditions. Identifying group of genes that manifest similar expression pattern is important in the analysis of gene expression in time series data. In the existing work, investigate the choice of proximity measures for the clustering of microarray data by evaluating the performance of 16 proximity measures from time course and cancer datasets experiments
International Journal of Data Mining, Modelling and Management, 2011
Identifying groups of genes with similar expression time courses is crucial in the analysis of gene expression time series data. This paper proposes a regulation-based clustering approach, PatternClus, for clustering gene expression data. The method also identifies sub-clusters based on an order preserving ranking approach. The clustering method was experimented in light of real life datasets and the proposed method has been established to perform satisfactorily. PatternClus was compared to some of the well-known clustering algorithms (k-means and hierarchical algorithm) and was found to give better results in terms of z-score measure of cluster validation. An incremental version of PatternClus is also presented here which helps in identifying clusters incrementally where the database is continuously increasing.
BMC Bioinformatics, 2008
Background In DNA microarray experiments, discovering groups of genes that share similar transcriptional characteristics is instrumental in functional annotation, tissue classification and motif identification. However, in many situations a subset of genes only exhibits consistent pattern over a subset of conditions. Conventional clustering algorithms that deal with the entire row or column in an expression matrix would therefore fail to detect these useful patterns in the data. Recently, biclustering has been proposed to detect a subset of genes exhibiting consistent pattern over a subset of conditions. However, most existing biclustering algorithms are based on searching for sub-matrices within a data matrix by optimizing certain heuristically defined merit functions. Moreover, most of these algorithms can only detect a restricted set of bicluster patterns. Results In this paper, we present a novel geometric perspective for the biclustering problem. The biclustering process is int...
Journal of Computational Biology, 1999
Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multicondition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an n-gene dataset is O[n2[log(n)]c]. We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its performance is demonstrated on simulated data and on real gene expression data, with very promising results.
Mathematics in Computer Science, 2008
Microarrays offer unprecedented possibilities for the so-called omic, e.g., genomic and proteomic, research. However, they are also quite challenging data to analyze. The aim of this paper is to provide a short tutorial on the most common approaches used for pattern discovery and cluster analysis as they are currently used for microarrays, in the hope to bring the attention of the Algorithmic Community on novel aspects of classification and data analysis that deserve attention and have potential for high reward.
BMC Bioinformatics, 2006
Background: DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in today's bioinformatics research.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery - DMKD '03, 2003
… (ISDA), 2011 11th …, 2011
2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2010
Procedia Computer Science, 2014
EURASIP Journal on Bioinformatics and Systems Biology, 2009
International Journal of Computational Bioscience, 2010
Molecules and Cells
Journal of Biomedical Informatics, 2004
International Journal of Bioinformatics Research and Applications, 2012