Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2002, Ernst Schering Research Foundation workshop
…
28 pages
1 file
Cluster analysis is a vital methodology for interpreting high-dimensional gene expression data generated by modern genomic technologies. This paper elucidates the clustering process, emphasizing techniques for grouping genes based on expression patterns, enabling exploration of gene functions and networks. Applications span developmental biology, disease diagnosis, and treatment effects, reflecting the tool's potential in advancing biomedical research.
IJRCAR, 2014
For the identification of classes of same characteristics or similar objects among a set of objects, clustering can be used effectively. The definition of similarity can be different in one clustering model to another. The concept of similarity is often based on metrics as Manhattan distance, Euclidean distance, Pearson correlation coefficient or any other measures depending on the model which is used for clustering. Similar objects must have values which are close in at least any set of the dimensions. Clustering is a used as an unsupervised data analysis approach in machine learning in the field of data mining .Computation of pairwise distance in advance becomes a common requirement amongst many existing clustering methods. This makes it computationally expensive and difficult to manage with huge data sets used as in the field of bioinformatics. In the pattern similarity cluster model two objects can be told as similar if they show a pattern which is robust on a subset of the existing dimensions. The new similarity concept models a large variety of applications like as in the field of bioinformatics. As in DNA microarray analysis, the expression levels of two or more genes may increase and decrease synchronously according to the responses from the environmental incentives. The magnitude of their expression levels may not be close, but the patterns they exhibit can be more over same. Discovery of such clusters of genes is important in revealing significant information about gene regulatory networks. There are other applications that can also benefit from the new model, because it is able to capture not only the closeness of values but also the closeness of patterns showed by the any object present. Clustering methods have been applied to gene expression data sets in order to group genes sharing common or similar expression profiles into separate efficient groups. In such analyses, designing an appropriate (dis)similarity measure is critical. It is expected to be especially efficient when the shape of expression profile is vital in determining the gene relationship, yet the expression magnitude should also be taken into account for to some extent
Physics of Atomic Nuclei, 2008
Systems biology and bioinformatics are now major fields for productive research. DNA microarrays and other array technologies, and genome sequencing have advanced to the point that it is now possible to monitor gene expression on a genomic scale. Gene expression analysis is discussed and some important clustering techniques are considered. The patterns identified in the data suggest similarities in the gene behavior which provides useful information for the gene functionalities. We discuss measures for investigating the homogeneity of gene expression data in order to optimize the clustering process. We contribute to the knowledge of functional roles and regulation of E. coli genes by proposing a classification of these genes based on consistently correlated genes in expression data and similarities of gene expression patterns. A new visualization tool for targeted projection pursuit and dimensionality reduction of gene expression data is demonstrated.
Proceedings of The National Academy of Sciences, 1998
A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
Journal of biotechnology, 2002
Expression arrays facilitate the monitoring of changes in the expression patterns of large collections of genes. The analysis of expression array data has become a computationally-intensive task that requires the development of bioinformatics technology for a number of key stages in the process, such as image analysis, database storage, gene clustering and information extraction. Here, we review the current trends in each of these areas, with particular emphasis on the development of the related technology being carried out within our groups.
Southeast Europe journal of soft computing, 2013
Gene expression analysis is becoming very important in order to understand complex living organisms. Rather than analyzing genes individually, there is more powerful approach, microarray technology to analyze the genes expression in high throughput. This new approach brings new analyses problems that make the interpretation difficult. To understand the correlated gene expression analysis easier some clustering methods are applied to the gene expression analysis. In this paper, different approach is represented to start to cluster with usin some computational strategies.
Genome Informatics …, 2000
Expression arrays facilitate the monitoring of changes in expression patterns of large collections of genes. It is generally expected that genes with similar expression patterns would correspond to proteins of common biological function. We assess this common assumption by comparing levels of similarity of expression patterns and statistical significance of biological terms that describe the corresponding protein functions. Terms are automatically obtained by mining large collections of Medline abstracts. We propose that the combined use of the tools for expression profiles clustering and automatic function retrieval, can be useful tools for the detection of biologically relevant associations between genes in complex gene expression experiments. The results obtained using publicly available experimental data show how, in general, an increase in the similarity of the expression patterns is accompanied by an enhancement of the amount of specific functional information or, in other words, how the selected terms became more specific following an increase in the specificity of the expression patterns. Particularly interesting are the discrepancies from this general trend, i.e. groups of genes with similar expression patterns but very little in common at the functional level. In these cases the similarity of their expression profiles becomes the first link between previously unrelated genes.
The availability of the global gene expression data have provided abundant evidence that sets of functionally related genes are co-ordinately induced or repressed in response to developmental or environmental changes, presumably via the action of sequence-specific DNA-binding transcription factors (TFs). This provides a mechanism to control specific aspects of physiology and it also enables the use of gene co-regulation to predict gene function, and underlies the fact that expression profiles can be used to classify samples. Gene expression is largely controlled at the transcriptional level, with transcriptional regulatory elements are located primarily in the upstream promoter region of each gene. With the wide availability of genome-wide expression data, it is possible to identify upstream regulatory motifs commonly shared by co-regulated genes. The common strategy to reveal the possibility in identifying co-regulated genes is by clustering analysis. The approach used in this paper to mine yeast expression data was started from clustering on the expression profiles followed by function categorisation and promoter analysis of the upstream region of the genes. By combination of over-represented oligonucleotide analysis and multiple-sequence alignment programs, it is possible to identify upstream regulatory motifs commonly shared by co-regulated genes. It is believed that good clustering is better than sophisticated motif-search algorithms. It would be highly desirable if one could combine motif and cluster analyses, as good clustering can facilitate motif identification, and, conversely, conserved motifs (or any other functional information related to the sequences) can help to improve clustering. However, the lack of quality upstream experimental data has made systematic global investigations very difficult. Predictions for which DNA-binding protein might be interacting with the motif can be obtained by computational methods, such as finding which predicted DNA-binding proteins have the motif in their upstream region, and searching for a member of a known DNA-binding protein family. The biological significance of some of the motifs presented here should be verified experimentally, including determination of factors binding to these motifs.
Journal of Computational Biology, 1999
Recent advances in biotechnology allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. The corresponding algorithmic problem is to cluster multicondition gene expression patterns. In this paper we describe a novel clustering algorithm that was developed for analysis of gene expression data. We define an appropriate stochastic error model on the input, and prove that under the conditions of the model, the algorithm recovers the cluster structure with high probability. The running time of the algorithm on an n-gene dataset is O[n2[log(n)]c]. We also present a practical heuristic based on the same algorithmic ideas. The heuristic was implemented and its performance is demonstrated on simulated data and on real gene expression data, with very promising results.
Pigment Cell Research, 2000
genes, both known and unknown. Three techniques, two-di-The response of cells to extracellular signals usually requires altered expression of many genes, possibly including several mensional gel electrophoresis, differential display, and gene discovery arrays, provide opportunities for measuring changes distinct metabolic pathways. In some cases, only a subset of in gene expression levels, as well as for identifying novel gene genes involved in such responses are known, which requires techniques to analyze changes in the expression of multiple products. important role in melanogenesis (i.e. Bcl2 and Emd) creating a need for techniques to help in new gene discovery, especially for those genes that do not have an obvious phenotype to help in their identification and isolation. The complete sequence of the human genome is projected to be available by the year 2003. Along with this, the location of each expressed gene will be mapped within this sequence, providing complete genomic and mRNA (coding) sequence information for every gene in the genome. It has been estimated that there are from 70,000 to over 100,000 different genes present in the human haploid genome, with each cell type expressing a different subset of these genes. The identification of every gene in the genome has begun by the creation of libraries containing expressed sequence tags (ESTs). ESTs are short nucleotide sequences produced by wholesale sequencing cDNA copies of isolated mRNA molecules, usually isolated from a single cell type. This produces a library of mRNA sequences that can specifically identify each gene expressed in that cell type. As of August, 1999, 1,537,470 human ESTs have been deposited in the National Center for Biotechnology Information (NCBI) database (URL: ) (2). The EST sequence for a particular gene can be derived from different portions of the mRNA molecule, and therefore several ESTs can identify the same mRNA transcript. When this is taken into account, over 52,907 unique human genes are thought to be represented in the EST database. Of these 52,907 transcripts, less than 20% have a known or predicted
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
BMC Bioinformatics, 2006
The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, 2000
Genome Biology, 2000
International Frontier Science Letters, 2016
2011 2nd National Conference on Emerging Trends and Applications in Computer Science, 2011
Computers in Biology and Medicine, 2008
Molecules and Cells
Physics of Atomic Nuclei, 2010