Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014, Asian Journal of Engineering and Technology
…
5 pages
1 file
Operons are the basic unit of transcription and can be used to understand the transcription regulation in a given prokaryotic genome. Currently, the sequence and gene coordinates of organisms can be rapidly identified, but their operons remain unknown. Moreover, the experimental methods detecting operons are extremely difficult and time-consuming to execute. Operon prediction as pretreatment can greatly reduce the cost of performing an experimental assay. Many algorithms and biological properties have been proposed but the resulting predictions still require improvement in terms of sensitivity, specificity, and accuracy. This study uses a teaching-learning-based optimization (TLBO) algorithm with three biological properties for operon prediction: the intergenic distance, the metabolic pathway, and the cluster of orthologous groups (COG). These properties for the Escherichia coli genome are used to train the evaluation standards of fitness function of gene pairs. The experimental res...
2013
An operon is the basic unit of transcription. The structural gene in the operon is co-transcribed into a single-stranded mRNA sequence, allowing operons to contribute to the understanding of transcription rules. However, experimental methods for detecting operons are extremely difficult and time-consuming to execute, thus using operon prediction as pre-treatment can greatly reduce the cost of performing an experimental assay. Previous studies have used different algorithms) with biological properties to predict genome operons distributions. This study uses a differential evolution (DE) algorithm with biological properties to predict the operons of bacterial genomes. The biological properties include the intergenic distance, the metabolic pathway, the cluster of orthologous groups (COG), gene length ratio and operon length. The Escherichia coli genome is used to train the evaluation standards of each property. The present study proposes DE for operon prediction, and also compares the...
Briefings in functional genomics, 2012
Accurate prediction of operons can improve the functional annotation and application of genes within operons in prokaryotes. Here, we review several features: (i) intergenic distance, (ii) metabolic pathways, (iii) homologous genes, (iv) promoters and terminators, (v) gene order conservation, (vi) microarray, (vii) clusters of orthologous groups, (viii) gene length ratio, (ix) phylogenetic profiles, (x) operon length/size and (xi) STRING database scores, as well as some other features, which have been applied in recent operon prediction methods in prokaryotes in the literature. Based on a comparison of the prediction performances of these features, we conclude that other, as yet undiscovered features, or feature selection with a receiver operating characteristic analysis before algorithm processing can improve operon prediction in prokaryotes.
Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology, 2000
We present a computational approach to predicting operons in the genomes of prokaryotic organisms. Our approach uses machine learning methods to induce predictive models for this task from a rich variety of data types including sequence data, gene expression data, and functional annotations associated with genes. We use multiple learned models that individually predict promoters, terminators and operons themselves. A key part of our approach is a dynamic programming method that uses our predictions to map every known and putative gene in a given genome into its most probable operon. We evaluate our approach using data from the E. coli K-12 genome.
Lecture Notes in Computer Science, 2008
The Brazilian Symposium on Bioinformatics (BSB) 2008 was held at Santo André (São Paulo), Brazil, August 28-30, 2008. BSB 2008 was the third symposium in the BSB series, although BSB was preceded by the Brazilian Workshop on Bioinformatics (WOB). This previous event had three consecutive editions in 2002 (Gramado, Rio Grande do Sul), 2003 (Macaé, Rio de Janeiro), and 2004 (Brasília, Distrito Federal). The change from workshop to symposium reflects the increasing quality and interest behind this meeting.
Nucleic Acids Research, 2010
An operon is a fundamental unit of transcription and contains specific functional genes for the construction and regulation of networks at the entire genome level. The correct prediction of operons is vital for understanding gene regulations and functions in newly sequenced genomes. As experimental methods for operon detection tend to be nontrivial and time consuming, various methods for operon prediction have been proposed in the literature. In this study, a binary particle swarm optimization is used for operon prediction in bacterial genomes. The intergenic distance, participation in the same metabolic pathway, the cluster of orthologous groups, the gene length ratio and the operon length are used to design a fitness function. We trained the proper values on the Escherichia coli genome, and used the above five properties to implement feature selection. Finally, our study used the intergenic distance, metabolic pathway and the gene length ratio property to predict operons. Experimental results show that the prediction accuracy of this method reached 92.1%, 93.3% and 95.9% on the Bacillus subtilis genome, the Pseudomonas aeruginosa PA01 genome and the Staphylococcus aureus genome, respectively. This method has enabled us to predict operons with high accuracy for these three genomes, for which only limited data on the properties of the operon structure exists.
Proceedings of the International …, 2010
An operon is a fundamental unit of transcription and contains specific functional genes for the construction and regulation of networks at the whole genome level. The prediction of operons is critical for understanding gene regulation and functions in newly sequenced genomes. As experimental methods for operon detection tend to be non-trivial and time-consuming, various methods for operon prediction have been proposed in the literature. In this study, a complementary binary particle swarm optimization (CBPSO) is used for operon prediction in bacterial genomes. We used complementary operation to improve the initialization procedure, and then used the intergenic distance, the metabolic pathway and the cluster of orthologous groups (COG) to design a fitness function. The proper values were trained on the Escherichia coli genome. Experimental results show that the prediction accuracy of this method reached 92.6%, 93.6%, 95.8% and 96.3% on Bacillus subtilis, Pseudomonas aeruginosa PA01, Staphylococcus aureus and Mycobacterium tuberculosis genomes, respectively. The proposed method predicted operons with high accuracy for the four test genomes.
Applied and Environmental Microbiology, 2007
Various computational approaches have been proposed for operon prediction, but most algorithms rely on experimental or functional data that are only available for a small subset of sequenced genomes. In this study, we explored the possibility of using phylogenetic information to aid in operon prediction, and we constructed a Bayesian hidden Markov model that incorporates comparative genomic data with traditional predictors, such as intergenic distances. The prediction algorithm performs as well as the best previously reported method, with several significant advantages. It uses fewer data sources and so it is easier to implement, and the method is more broadly applicable than previous methods-it can be applied to essentially every gene in any sequenced bacterial genome. Furthermore, we show that near-optimal performance is easily reached with a generic set of comparative genomes and does not depend on a specific relationship between the subject genome and the comparative set. We applied the algorithm to the Bacillus anthracis genome and found that it successfully predicted all previously verified B. anthracis operons. To further test its performance, we chose a predicted operon (BA1489-92) containing several genes with little apparent functional relatedness and tested their cotranscriptional nature. Experimental evidence shows that these genes are cotranscribed, and the data have interesting implications for B. anthracis biology. Overall, our findings show that this algorithm is capable of highly sensitive and accurate operon prediction in a wide range of bacterial genomes and that these predictions can lead to the rapid discovery of new functional relationships among genes.
2002
The prediction of operons, the smallest unit of transcription in prokaryotes, is the first step towards reconstruction of a regulatory network at the whole genome level. Sequence information, in particular the distance between open reading frames, has been used to predict if adjacent Escherichia coli genes are in an operon. While appreciably successful, these predictions need to be validated and refined experimentally. As a growing number of gene expression array experiments on E.coli became available, we investigated to what extent they could be used to improve and validate these predictions. To this end, we examined a large collection of published microarry data. The correlation between expression ratios of adjacent genes was used in a Bayesian classification scheme to predict whether the genes are in an operon or not. We found that for the genes whose expression levels change significantly across the experiments in the data set, the currently available gene expression data allowed a significant refinement of the sequenced-based predictions. We report these co-expression correlations in an E.coli genomic map. For a significant portion of gene pairs, however, the set of array experiments considered did not contain sufficient information to determine whether they are in the same transcriptional unit. This is not due to unreliability of the array data per se, but to the design of the experiments analyzed. In general, experiments that perturb a large number of genes offer more information for operon prediction than confined perturbations. These results provide a rationale for conducting expression studies comparing conditions that cause global changes in gene expression.
Nucleic Acids Research, 2005
An important step in understanding the regulation of a prokaryotic genome is the generation of its transcription unit map. The current strongest operon predictor depends on the distributions of intergenic distances (IGD) separating adjacent genes within and between operons. Unfortunately, experimental data on these distance distributions are limited to Escherichia coli and Bacillus subtilis. We suggest a new graph algorithmic approach based on comparative genomics to identify clusters of conserved genes independent of IGD and conservation of gene order. As a consequence, distance distributions of operon pairs for any arbitrary prokaryotic genome can be inferred. For E.coli, the algorithm predicts 854 conserved adjacent pairs with a precision of 85%. The IGD distribution for these pairs is virtually identical to the E.coli operon pair distribution. Statistical analysis of the predicted pair IGD distribution allows estimation of a genome-specific operon IGD cut-off, obviating the requirement for a training set in IGD-based operon prediction. We apply the method to a representative set of eight genomes, and show that these genome-specific IGD distributions differ considerably from each other and from the distribution in E.coli.
Nucleic Acids Research, 2010
We present a simple and highly accurate computational method for operon prediction, based on intergenic distances and functional relationships between the protein products of contiguous genes, as defined by STRING database (Jensen,L.J., Kuhn,M., Stark,M., Chaffron,S., Creevey,C., Muller,J., Doerks,T., Julien,P., Roth,A., Simonovic,M. et al. (2009) STRING 8-a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res., 37, D412-D416). These two parameters were used to train a neural network on a subset of experimentally characterized Escherichia coli and Bacillus subtilis operons. Our predictive model was successfully tested on the set of experimentally defined operons in E. coli and B. subtilis, with accuracies of 94.6 and 93.3%, respectively. As far as we know, these are the highest accuracies ever obtained for predicting bacterial operons. Furthermore, in order to evaluate the predictable accuracy of our model when using an organism's data set for the training procedure, and a different organism's data set for testing, we repeated the E. coli operon prediction analysis using a neural network trained with B. subtilis data, and a B. subtilis analysis using a neural network trained with E. coli data. Even for these cases, the accuracies reached with our method were outstandingly high, 91.5 and 93%, respectively. These results show the potential use of our method for accurately predicting the operons of any other organism. Our operon predictions for fully-sequenced genomes are available at
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
BMC Bioinformatics, 2014
Bioinformatics, 2002
BMC Bioinformatics, 2017
Proceedings II of the 28st Conference STUDENT EEICT 2022: Selected papers., 2022
Proceedings of the National Academy of Sciences, 2000
Bioinformatics, 2006
Bioinformatics, 2002
Journal of Theoretical Biology, 2004
PLoS Computational Biology, 2008
Nucleic Acids Research, 2006
Lecture Notes in Computer Science, 2004