Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Aug;15(8):1051-60.
doi: 10.1101/gr.3642605. Epub 2005 Jul 15.

Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences

Affiliations

Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences

David C King et al. Genome Res. 2005 Aug.

Abstract

Techniques of comparative genomics are being used to identify candidate functional DNA sequences, and objective evaluations are needed to assess their effectiveness. Different analytical methods score distinctive features of whole-genome alignments among human, mouse, and rat to predict functional regions. We evaluated three of these methods for their ability to identify the positions of known regulatory regions in the well-studied HBB gene complex. Two methods, multispecies conserved sequences and phastCons, quantify levels of conservation to estimate a likelihood that aligned DNA sequences are under purifying selection. A third function, regulatory potential (RP), measures the similarity of patterns in the alignments to those in known regulatory regions. The methods can correctly identify 50%-60% of noncoding positions in the HBB gene complex as regulatory or nonregulatory, with RP performing better than do other methods. When evaluated by the ability to discriminate genomic intervals, RP reaches a sensitivity of 0.78 and a true discovery rate of approximately 0.6. The performance is better on other reference sets; both phastCons and RP scores can capture almost all regulatory elements in those sets along with approximately 7% of the human genome.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Conservation and RP scores for human–mouse–rat alignments in the HBB complex. The scores for MCS, phastCons, and regulatory potential (RP) are plotted in sliding windows along the HBB gene complex. Gray peaks show the scores for exons, which give the most pronounced signals in this region for all scoring methods. Scores overlapping known regulatory regions are shown by the green peaks, those overlapping repeats found by RepeatMasker (A.F.A. Smit and P. Green, unpub., http://ftp.genome.washington.edu/RM/RepeatMasker.html) are red, and scores in uncharacterized regions are shown as blue. A horizontal solid line represents the threshold with optimal performance (interval evaluation) for each score. The dashed line in the MCS graph is the threshold calculated by WebMCS, according to the 95th percentile of conserved sites. Below the graphs is a panel from the UCSC Genome Browser with the known CRMs in the HBB complex as a custom track (green), the RefSeq genes in blue, and repeats in black. The interval chr11:5227344–5229500 contains hemoglobinβ pseudogene 1 (HBBP1, RefSeq: NR_001589), which is also masked from further analyses.
Figure 2.
Figure 2.
Ability to identify positions in CRMs in the HBB gene complex for three scoring methods based on human–mouse–rat alignments. The graphs in the left column display the lowess-smoothed distribution of scores at positions in noncoding alignments for the regulatory regions (positives, black line) and the nonregulatory regions (negatives, gray area). The graphs in the center column display the sensitivity (Sn, dashed line) and specificity (Sp, solid line) of each method, determined by the fraction of each distribution in the left columns that is above and below the scoring thresholds. The receiver-operator characteristic (ROC) graphs plot Sn versus 1 – Sp for each scoring threshold (thick line). The values at the optimal threshold are plotted as the circle with the cross hairs. The expectation for a random signal follows the diagonal thin line.
Figure 3.
Figure 3.
Cumulative distributions of RP and phastCons scores in functional regions compared to the total aligned genomic DNA. The cumulative fraction with a maximal score below a scoring threshold for RP (A) and phastCons (B) is shown for each of six sets of functional sequences (colored lines). The purple line is for the CRMs in the HBB gene complex, gold is for the RefSeq coding exons (Pruitt and Maglott 2001), green is for the regulatory element training set (Elnitski et al. 2003), red is for a set of developmental enhancers (Plessy et al. 2005), brown is for miRNAs (Griffiths-Jones 2004), and blue is for functional promoters (Trinklein et al. 2003). The evaluation is based on the highest score within each interval for the functional elements. The cumulative distributions of scores for all the human–mouse–rat aligned positions are the black lines in each graph. For RP, every fifth base pair in alignments was scored (as the center of a 100-bp window), and for phastCons, all base pairs in alignments were scored. A vertical line is drawn at the optimal threshold for discriminating intervals (Table 2).

Similar articles

Cited by

References

    1. Allan, M., Lanyon, G., and Paul, J. 1983. Multiple origins of transcription in the 4.5 kb upstream of the ε-globin gene. Cell 35: 187–197. - PubMed
    1. Antoniou, M., deBoer, E., Habets, G., and Grosveld, F. 1988. The human β-globin gene contains multiple regulatory regions: Identification of one promoter and two downstream enhancers. EMBO J. 7: 377–384. - PMC - PubMed
    1. Behringer, R.R., Hammer, R.E., Brinster, R.L., Palmiter, R.D., and Townes, T.M. 1987. Two 3′ sequences direct adult erythroid-specific expression of human β-globin genes in transgenic mice. Proc. Natl. Acad. Sci. 84: 7056–7060. - PMC - PubMed
    1. Bender, M., Reik, A., Close, J., Telling, A., Epner, E., Fiering, S., Hardison, R., and Groudine, M. 1998. Description and targeted deletion of 5′ HS5 and 6 of the mouse β-globin locus control region. Blood 92: 4394–4403. - PubMed
    1. Berman, B.P., Pfeiffer, B.D., Laverty, T.R., Salzberg, S.L., Rubin, G.M., Eisen, M.B., and Celniker, S.E. 2004. Computational identification of developmental enhancers: Conservation and function of transcription factor binding-site clusters in Drosophila melanogaster and Drosophila pseudoobscura. Genome Biol. 5: R61. - PMC - PubMed

Web site references

    1. http://www.bx.psu.edu/; GALA and dbERGEII databases
    1. http://www.bx.psu.edu/~ross/dataset/DatasetHome.html; reference set of CRMs in HBB gene complex
    1. http://genome.ucsc.edu/; Genome Browser at UCSC
    1. http://research.nhgri.nih.gov/MCS/; WebMCS for computing multispecies conserved sequences
    1. http://www.sanger.ac.uk/Software/Rfam/mirna/index.shtml miRNA Registry

Publication types