Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009, 2009 9th International Conference on Information Technology and Applications in Biomedicine
…
4 pages
1 file
Gene expression patterns that can distinguish to a clinically significant degree disease subclasses not only play a prominent role in diagnosis but also lead to therapeutic strategies that tailor treatment to the particular biology of each disease. Nevertheless, gene expression signatures derived through statistical feature identification procedures on population datasets have received rightful criticism, since they share only few genes in common for a particular pathology, even if they derived from the same dataset using different methodologies. An optimistic view to this problem emerging from the wealth of biological interactions is that a statistical solution may not be unique. The derived signatures may be complementary parts of a global one, with each individual signature intersecting only a small part of biological evidence. In this work we focus on the biological knowledge hidden behind different gene signatures and propose a methodology for integrating such knowledge towards retrieving a unified signature.
2010
Gene expression signatures of toxicity and clinical response benefit both safety assessment and clinical practice; however, difficulties in connecting signature genes with the predicted end points have limited their application. The Microarray Quality Control Consortium II (MAQCII) project generated 262 signatures for ten clinical and three toxicological end points from six gene expression data sets, an unprecedented collection of diverse signatures that has permitted a wide-ranging analysis on the nature of such predictive models. A comprehensive analysis of the genes of these signatures and their nonredundant unions using ontology enrichment, biological network building and interactome connectivity analyses demonstrated the link between gene signatures and the biological basis of their predictive power. Different signatures for a given end point were more similar at the level of biological properties and transcriptional control than at the gene level. Signatures tended to be enriched in function and pathway in an end point and model-specific manner, and showed a topological bias for incoming interactions. Importantly, the level of biological similarity between different signatures for a given end point correlated positively with the accuracy of the signature predictions. These findings will aid the understanding, and application of predictive genomic signatures, and support their broader application in predictive medicine.
Large collections of gene signatures play a pivotal role in interpreting results of omics data analysis but suffer from compositional (large overlap) and functional (redundant read-outs) redundancy, and many gene signatures rarely pop-up in statistical tests. Based on pan-cancer data analysis, here we define a restricted set of 962 so called informative signatures and demonstrate that they have more chances to appear highly enriched in cancer biology studies. We show that the majority of informative signatures conserve their weights for the composing genes (eigengenes) from one cancer type to another. We construct InfoSigMap, an interactive online map showing the structure of compositional and functional redundancies between informative signatures and charting the territories of biological functions accessible through transcriptomic studies. InfoSigMap can be used to visualize in one insightful picture the results of comparative omics data analyses and suggests reconsidering existin...
2019
Mining gene expression profiles has proven valuable for identifying signatures serving as surrogates of cancer phenotypes. However, the similarities of such signatures across different cancer types have not been strong enough to conclude that they represent a universal biological mechanism shared among multiple cancer types. Here we describe a network-based approach that explores gene-to-gene connections in multiple cancer datasets while maximizing the overall association of the subnetwork with clinical outcomes. With the dataset of The Cancer Genome Atlas (TCGA), we studied the characteristics of common gene expression of three types of cancers: Rectum adenocarcinoma (READ), Breast invasive carcinoma (BRCA) and Colon adenocarcinoma (COAD). By analyzing several pairs of highly correlated genes after filtering and clustering work, we found that the co-expressed genes across multiple types of cancers point to particular biological mechanisms related to cancer cell progression , sugges...
PLOS One, 2011
Informative genes from microarray data can be used to construct prediction model and investigate biological mechanisms. Differentially expressed genes, the main targets of most gene selection methods, can be classified as single-and multipleclass specific signature genes. Here, we present a novel gene selection algorithm based on a Group Marker Index (GMI), which is intuitive, of low-computational complexity, and efficient in identification of both types of genes. Most gene selection methods identify only single-class specific signature genes and cannot identify multiple-class specific signature genes easily. Our algorithm can detect de novo certain conditions of multiple-class specificity of a gene and makes use of a novel non-parametric indicator to assess the discrimination ability between classes. Our method is effective even when the sample size is small as well as when the class sizes are significantly different. To compare the effectiveness and robustness we formulate an intuitive template-based method and use four well-known datasets. We demonstrate that our algorithm outperforms the template-based method in difficult cases with unbalanced distribution. Moreover, the multiple-class specific genes are good biomarkers and play important roles in biological pathways. Our literature survey supports that the proposed method identifies unique multiple-class specific marker genes (not reported earlier to be related to cancer) in the Central Nervous System data. It also discovers unique biomarkers indicating the intrinsic difference between subtypes of lung cancer. We also associate the pathway information with the multiple-class specific signature genes and cross-reference to published studies. We find that the identified genes participate in the pathways directly involved in cancer development in leukemia data. Our method gives a promising way to find genes that can involve in pathways of multiple diseases and hence opens up the possibility of using an existing drug on other diseases as well as designing a single drug for multiple diseases.
International Journal of Approximate Reasoning, 2008
We propose a combination of machine learning techniques to integrate predictive profiling from gene expression with clinical and epidemiological data. Starting from BioDCV, a complete software setup for predictive classification and feature ranking without selection bias, we apply semisupervised profiling for detecting outliers and deriving informative subtypes of patients. During the profiling process, sampletracking curves are extracted, and then clustered according to a distance derived from dynamic time warping. Sampletracking allows also the identification of outlier cases, whose removal is shown to improve predictive accuracy and stability of derived gene profiles. Here we propose to employ clinical features to validate the semisupervising procedure. The procedure is demonstrated in the analysis of a liver cancer dataset of 213 samples described by 1993 genes and by pathological features.
Bioinformatics, 2006
Motivation: Several statistical methods that combine analysis of differential gene expression with biological knowledge databases have been proposed for a more rapid interpretation of expression data. However, most such methods are based on a series of univariate statistical tests and do not properly account for the complex structure of gene interactions. Results: We present a simple yet effective multivariate statistical procedure for assessing the correlation between a subspace defined by a group of genes and a binary phenotype. A subspace is deemed significant if the samples corresponding to different phenotypes are well separated in that subspace. The separation is measured using Hotelling's T2 statistic, which captures the covariance structure of the subspace. When the dimension of the subspace is larger than that of the sample space, we project the original data to a smaller orthonormal subspace. We use this method to search through functional pathway subspaces defined by ...
Sigkdd Explorations, 2007
We survey the progress in the analysis of gene expression data for the purposes of disease subtype diagnosis, new subtype discovery, and understanding of diseases and treatment responses. We find existing works fall short on several issues: these works provide little information on the interplay between selected genes; the collection of pathways that can be used, evaluated, and ranked against the observed expression data is limited; and a comprehensive set of rules for reasoning about relevant molecular events has not been compiled and formalized. We thus envision an advanced integrated framework, and are developing a system based on it, to provide biologically inspired solutions. It comprises: (i) automated analysis and extraction of information from biomedical texts; (ii) targeted construction of known pathways; and (iii) direct hypothesis generation based on logical reasoning on, and tests for, consistencies and inconsistencies of observed data against known pathways.
Journal of Experimental & Clinical Cancer Research, 2010
Background: The advent of global gene expression profiling has generated unprecedented insight into our molecular understanding of cancer, including breast cancer. For example, human breast cancer patients display significant diversity in terms of their survival, recurrence, metastasis as well as response to treatment. These patient outcomes can be predicted by the transcriptional programs of their individual breast tumors. Predictive gene signatures allow us to correctly classify human breast tumors into various risk groups as well as to more accurately target therapy to ensure more durable cancer treatment. Results: Here we present a novel algorithm to generate gene signatures with predictive potential. The method first classifies the expression intensity for each gene as determined by global gene expression profiling as low, average or high. The matrix containing the classified data for each gene is then used to score the expression of each gene based its individual ability to predict the patient characteristic of interest. Finally, all examined genes are ranked based on their predictive ability and the most highly ranked genes are included in the master gene signature, which is then ready for use as a predictor. This method was used to accurately predict the survival outcomes in a cohort of human breast cancer patients. Conclusions: We confirmed the capacity of our algorithm to generate gene signatures with bona fide predictive ability. The simplicity of our algorithm will enable biological researchers to quickly generate valuable gene signatures without specialized software or extensive bioinformatics training.
PLoS Computational Biology, 2011
Nucleic Acids Research, 2010
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
NPJ systems biology and applications, 2017
ceremade.dauphine.fr
Molecules, 2020
BMC Genomics, 2011
Nucleic Acids Research, 2015
Nucleic Acids Research, 2012
The American Journal of Human Genetics, 2013
BMC Medical Genomics, 2016
Briefings in Bioinformatics, 2004
BMC Bioinformatics, 2007
BMC Bioinformatics, 2009