Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008;9 Suppl 1(Suppl 1):S2.
doi: 10.1186/1471-2164-9-S1-S2.

Prediction-based approaches to characterize bidirectional promoters in the mammalian genome

Affiliations
Comparative Study

Prediction-based approaches to characterize bidirectional promoters in the mammalian genome

Mary Qu Yang et al. BMC Genomics. 2008.

Abstract

Background: Machine learning approaches are emerging as a way to discriminate various classes of functional elements. Previous attempts to create Regulatory Potential (RP) scores to discriminate functional DNA from nonfunctional DNA included using Markov models trained to identify sequences from promoters and enhancers from ancestral repeats. We proposed that knowledge gleaned from those methods could be further refined using a multiple class predictor to separate classes of promoter elements from enhancers or nonfunctional DNA.

Results: We extended our previous work, which identified over 5,000 candidate bidirectional promoters in the human genome, to map the orthologous promoter regions in the mouse genome. Our algorithm measured the robustness of evidence provided by the spliced EST annotations and incorporated evidence from annotations of UCSC Known Genes and GenBank mRNA. In preparation for de novo prediction of this promoter type, we examined characteristic features of the dataset as a whole. For instance, bidirectional promoters score very highly among all functional elements for Regulatory Potential Scores. This result was unexpected due to the limited sequence conservation found in these noncoding regions. We demonstrate that bidirectional promoters can be classified apart from other genomic features including non-bidirectional promoters, i.e. those promoters having no nearby upstream genes. Furthermore bidirectional promoters consistently score at the level of very highly conserved functional elements in the genome- developmental enhancers. The high scores are due to sequence-based characteristics within the promoters, not the surrounding exons. These results indicate that high-scoring RP regions can be deconvoluted into various functional classes of genomic elements. Using a multiple class predictor we are able to discriminate bidirectional promoters from enhancers, non-bidirectional promoters, and non-promoter regions on the basis of RP scores and CpG islands.

Conclusions: We examine orthology at bidirectional promoters, use discriminatory machine learning approaches to differentiate multiple types of promoters from other functional and nonfunctional features in the genome and begin the process of deconvoluting classes of functional regions that score well with RP scores. These types of approaches precede supervised learning techniques to discover unannotated promoter regions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Validation of bidirectional promoters using the RIKEN CAGE dataset. Pie charts depict the number of bidirectional promoters with CAGE transcripts that correspond to detectable transcripts on both sides (black), only one side (gray), or no evidence (white). Note that these do not have to be transcribed in the same tissues to be included in our study. The upper panel is based on human transcripts from the human sequence assembly, hg17, while the lower panel uses CAGE data and transcripts from the mouse sequence assembly, mm5. Bidirectional promoters were mapped in Known Genes (left column), GenBank mRNA (middle column), and spliced ESTs (right column).
Figure 2
Figure 2
Orthologous mapping of human bidirectional promoters to mouse. Promoter orthology was de-termined by identifying ortholgous genes in mouse and checking for evidence of bidirectional promoters. Genes that had a 5′ neighbor transcribed in the opposite direction are shown for promoters of Known Genes(maroon), Genbank mRNA (pink), and ESTs (red). Genes with no neighbor in mouse lack evidence for bidirectional promoters (green). Genes that could not be mapped to mouse are shown in blue.
Figure 3
Figure 3
RP score cumulative distribution functions for bidirectional promoters in human and mouse. Bidirectional promoters identified from Known Genes (KG), mRNA, and ESTs all yield similar scores in both human and mouse genomes. RP scores were calculated based on genome assemblies hg17 (human) and mm8 (mouse).
Figure 4
Figure 4
Cumulative distribution functions of RP scores for different functional classes. These include bidirectional promoters (red, green, blue), non-bidirectional promoters (purple) and unbounded promoters (light blue, pink, light green). Other functional elements are coding regions (aqua), tail-to-tail regions (yellow) and enhancers (maroon). The nonfunctional elements are represented by ancestral repeats (black).
Figure 5
Figure 5
(a) Class-conditional probability density functions p(x|BP) (bidirectional promoters) and p(x|NP) (non-promoters). (b) Class-conditional probability density functions p(x|BP) (bidirectional promoters) and p(x|UBP1000) (unbounded promoters).
Figure 6
Figure 6
(a) Receiver operating characteristic (ROC) for classifier that discriminates bidirectional promoters from non-promoters. (b) Receiver operating characteristic (ROC) for classifier that discriminates bidirectional promoters from unbounded promoters.
Figure 7
Figure 7
Algorithm for classifying regions into one of four classes: bidirectional promoter, unbounded promoter, non-promoter, or enhancer.

References

    1. Adachi N, Lieber MR. Bidirectional gene organization: a common architectural feature of the human genome. Cell. 2002;109:807–9. doi: 10.1016/S0092-8674(02)00758-4. - DOI - PubMed
    1. Trinklein ND, Aldred SF, Hartman SJ, Schroeder DI, Otillar RP, Myers RM. An Abundance of Bidirectional Promoters in the Human Genome. Genome Res. 2004;14:62–66. doi: 10.1101/gr.1982804. http://www.genome.org/cgi/content/abstract/14/1/62 - DOI - PMC - PubMed
    1. Yang MQ, Elnitski LL. In Lecture Notes in Bioinformatics. Springer-Verlag; 2007. A computational study of bidirectional promoters in the human genome.
    1. Yang MQ, Koehly LM, Elnitski LL. Comprehensive annotation of human bidirectional promoters identifies co-regulatory relationships among somatic breast and ovarian cancer genes. PLoS Computational Biology. 2007;3 [(E72.eor)] - PMC - PubMed
    1. Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. The UCSC Known Genes. Bioinformatics. 2006;22:1036–1046. doi: 10.1093/bioinformatics/btl048. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/9/1036 - DOI - PubMed

Publication types