Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Feb;73(3):846-54.
doi: 10.1128/AEM.01686-06. Epub 2006 Nov 22.

Operon prediction for sequenced bacterial genomes without experimental information

Affiliations

Operon prediction for sequenced bacterial genomes without experimental information

Nicholas H Bergman et al. Appl Environ Microbiol. 2007 Feb.

Abstract

Various computational approaches have been proposed for operon prediction, but most algorithms rely on experimental or functional data that are only available for a small subset of sequenced genomes. In this study, we explored the possibility of using phylogenetic information to aid in operon prediction, and we constructed a Bayesian hidden Markov model that incorporates comparative genomic data with traditional predictors, such as intergenic distances. The prediction algorithm performs as well as the best previously reported method, with several significant advantages. It uses fewer data sources and so it is easier to implement, and the method is more broadly applicable than previous methods--it can be applied to essentially every gene in any sequenced bacterial genome. Furthermore, we show that near-optimal performance is easily reached with a generic set of comparative genomes and does not depend on a specific relationship between the subject genome and the comparative set. We applied the algorithm to the Bacillus anthracis genome and found that it successfully predicted all previously verified B. anthracis operons. To further test its performance, we chose a predicted operon (BA1489-92) containing several genes with little apparent functional relatedness and tested their cotranscriptional nature. Experimental evidence shows that these genes are cotranscribed, and the data have interesting implications for B. anthracis biology. Overall, our findings show that this algorithm is capable of highly sensitive and accurate operon prediction in a wide range of bacterial genomes and that these predictions can lead to the rapid discovery of new functional relationships among genes.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Phylogenetic barcode differences in intra- and interoperonic pairs. (A) Illustration of the method for calculating the change in phylogenetic distribution (difference in phylogenetic barcode) for a given intergenic region. In the case shown, the phylogenetic barcode difference for intergenic region A-B is 6. (B) Phylogenetic barcode differences in experimentally verified transcriptional units. Data for both intraoperonic intergenic regions (squares connected by solid lines) and interoperonic intergenic regions (triangles connected by dashed lines) are shown.
FIG. 2.
FIG. 2.
ROC curves representing algorithm performance using different data sources. Dashed line, predictions generated using the phylogenetic barcode information; dotted line, predictions generated using the intergenic data alone; solid line, predictions generated using both sources.
FIG. 3.
FIG. 3.
RT-PCR analysis of the BA1489-92 region. Gels show the PCR products amplified by the designated primer pairs, which are located in the BA1489-92 region of the B. anthracis chromosome as shown at the bottom. Control RT-PCRs amplifying a section of the sod15 (BA1489) locus or a portion of the elongation factor G mRNA sequence are also shown. Note that control reactions in which reverse transcriptase (but not DNA polymerase) was omitted were done for each set and showed in each case that product amplification was not due to DNA contamination.
FIG. 4.
FIG. 4.
ROC curves representing the performance of the optimized algorithm (Table 2) and the algorithm described recently by Price et al. (23) when tested using the same set of experimentally verified E. coli operons. The area under the curve is 0.916 for the algorithm reported here and 0.917 for that described by Price et al.

References

    1. Allen, J., M. Pertea, and S. L. Salzberg. 2004. Computational gene prediction using multiple sources of evidence. Genome Res. 14:142-148. - PMC - PubMed
    1. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389-3402. - PMC - PubMed
    1. Bockhorst, J., M. Craven, D. Page, J. Shavlik, and J. Glasner. 2003. A Bayesian network approach to operon prediction. Bioinformatics 19:1227-1235. - PubMed
    1. Bockhorst, J., Y. Qiu, J. Glasner, M. Liu, F. Blattner, and M. Craven. 2003. Predicting bacterial transcription units using sequence and expression data. Bioinformatics 19(Suppl. 1):i34-i43. - PubMed
    1. Chen, X., Z. Su, P. Dam, B. Palenik, Y. Xu, and T. Jiang. 2004. Operon prediction by comparative genomics: an application to the Synechococcus sp. WH8102 genome. Nucleic Acids Res. 32:2147-2157. - PMC - PubMed

Publication types

LinkOut - more resources