Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1999, Journal of Computational Biology
…
13 pages
1 file
Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multicondition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an n-gene dataset is O[n2[log(n)]c]. We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its performance is demonstrated on simulated data and on real gene expression data, with very promising results.
Southeast Europe journal of soft computing, 2013
Gene expression analysis is becoming very important in order to understand complex living organisms. Rather than analyzing genes individually, there is more powerful approach, microarray technology to analyze the genes expression in high throughput. This new approach brings new analyses problems that make the interpretation difficult. To understand the correlated gene expression analysis easier some clustering methods are applied to the gene expression analysis. In this paper, different approach is represented to start to cluster with usin some computational strategies.
Bioinformatics, 2004
Motivation: A measurement of cluster quality is needed to choose potential clusters of genes that contain biologically relevant patterns of gene expression. This is strongly desirable when a large number of gene expression profiles have to be analyzed and proper clusters of genes need to be identified for further analysis, such as the search for meaningful patterns, identification of gene functions or gene response analysis. Results: We propose a new cluster quality method, called stability, by which unsupervised learning of gene expression data can be performed efficiently. The method takes into account a cluster's stability on partition. We evaluate this method and demonstrate its performance using four independent, real gene expression and three simulated datasets. We demonstrate that our method outperforms other techniques listed in the literature. The method has applications in evaluating clustering validity as well as identifying stable clusters.
Bioinformatics, 2007
Motivation: Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. Results: The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.
Computers & Operations Research, 2012
High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics -clustering gene expression data -to the operations research community.
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure.
BMC Bioinformatics, 2006
Background: DNA Microarray technology is an innovative methodology in experimental molecular biology, which has produced huge amounts of valuable data in the profile of gene expression. Many clustering algorithms have been proposed to analyze gene expression data, but little guidance is available to help choose among them. The evaluation of feasible and applicable clustering algorithms is becoming an important issue in today's bioinformatics research.
2011 2nd National Conference on Emerging Trends and Applications in Computer Science, 2011
The advent of DNA microarray technology has enabled biologists to monitor the expression levels (MRNA) of thousands of genes simultaneously. In this survey, we address various approaches to gene expression data analysis using clustering techniques. We discuss the performance of various existing clustering algorithms under each of these approaches. Proximity measure plays an important role in making a clustering technique effective. Therefore, we briefly discuss various proximity measures. Finally, since evaluation of the effectiveness of the clustering techniques over gene data requires validity measures and data sources for numeric data, we discuss them as well.
Genetics and molecular research : GMR, 2005
Several advanced techniques have been proposed for data clustering and many of them have been applied to gene expression data, with partial success. The high dimensionality and the multitude of admissible perspectives for data analysis of gene expression require additional computational resources, such as hierarchical structures and dynamic allocation of resources. We present an immune-inspired hierarchical clustering device, called hierarchical artificial immune network (HaiNet), especially devoted to the analysis of gene expression data. This technique was applied to a newly generated data set, involving maize plants exposed to different aluminum concentrations. The performance of the algorithm was compared with that of a self-organizing map, which is commonly adopted to deal with gene expression data sets. More consistent and informative results were obtained with HaiNet.
2012 5th International Symposium on Communications, Control and Signal Processing, 2012
Clustering is one of most useful tools for the microarray gene expression data analysis. Although there have been many reviews and surveys in the literature, many good and effective clustering ideas have not been collected in a systematic way for some reasons. In this paper, we review five clustering families representing five clustering concepts rather than five algorithms. We also review some clustering validations and collect a list of benchmark gene expression datasets.
Data Mining refers to as the nontrivial process of " identifying valid, novel, potentially useful and ultimately understandable pattern in data". Based on the type of knowledge that is mined, data mining can be classified in to different models such as Clustering, Decision trees, Association rules, and Sequential pattern and time series. In this paper work, an attempt has been made to study theoretical background and applications of Clustering techniques in data mining with a special emphasis on analysis of Gene Expression under Bioinformatics. Bioinformatics is the study of genetic and other biological information using computer and statistical techniques. DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. 1) A good of data means that many of the challenges in biology are now challenges in computing. 2) A step toward addressing this challenge is the use of clustering technique, which is essential in the data mining process to reveal natural structures and identifying interesting patterns in the underlying data. In this paper work, effort has been made to compare between few Clustering algorithms such as: K means, Hierarchical, Self-Organization Map (SOM), and Cluster Affinity Search Technique (CAST) with proposed algorithm called CAST+. Strengths and Weaknesses of the above Clustering algorithms are indented and drawbacks like knowing number of clusters before clustering, and taking affinity threshold as input from the users are rectified by the proposed algorithm. Results show that Proposed Algorithm is efficient in comparison with other Clustering algorithms mentioned above.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Computers in Biology and Medicine, 2008
Ernst Schering Research Foundation workshop, 2002
International Journal of Data Mining, Modelling and Management, 2011
International Journal of Bioinformatics Research and Applications, 2009
Annals of Operations Research, 2017
Pattern Recognition Letters, 2007
Physics of Atomic Nuclei, 2010