Academia.eduAcademia.edu

Features for computational operon prediction in prokaryotes

2012, Briefings in functional genomics

Abstract

Accurate prediction of operons can improve the functional annotation and application of genes within operons in prokaryotes. Here, we review several features: (i) intergenic distance, (ii) metabolic pathways, (iii) homologous genes, (iv) promoters and terminators, (v) gene order conservation, (vi) microarray, (vii) clusters of orthologous groups, (viii) gene length ratio, (ix) phylogenetic profiles, (x) operon length/size and (xi) STRING database scores, as well as some other features, which have been applied in recent operon prediction methods in prokaryotes in the literature. Based on a comparison of the prediction performances of these features, we conclude that other, as yet undiscovered features, or feature selection with a receiver operating characteristic analysis before algorithm processing can improve operon prediction in prokaryotes.

Key takeaways

  • Recently, various genomic features have been found to be associated with operon structures; these features have been used to predict operons.
  • The intergenic distance between two adjacent genes within the same operon tends to be rather short, while the distance between two adjacent genes of different operons is relatively long [34].
  • It showed high accuracies of 94.6% and 93.3% for the experimentally defined operons in E. coli and B. subtilis, respectively, suggesting that COG is a useful feature for operon prediction.
  • As genes within one operon are co-transcribed in the same orientation, the length/size characteristic can be used to predict if genes in a genome are clustered into operons.
  • The NN method currently provides the highest accuracy for operon prediction in the E. coli and B. subtilis genomes ( Table 2), suggesting that some newly identified features could be used for operon prediction.