Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007;8(6):R124.
doi: 10.1186/gb-2007-8-6-r124.

Measuring the accuracy of genome-size multiple alignments

Affiliations

Measuring the accuracy of genome-size multiple alignments

Amol Prakash et al. Genome Biol. 2007.

Abstract

Whole-genome alignments are invaluable for comparative genomics. Before doing any comparative analysis on a region of interest, one must have confidence in that region's alignment. We provide a methodology to measure the accuracy of arbitrary regions of these alignments, and apply it to the UCSC Genome Browser's 17-vertebrate alignment. We identify 9.7% (21 Mbp) of the human chromosome 1 alignment as suspiciously aligned. We present independent evidence that many of these suspicious regions represent misalignments.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distribution of alignment segment p-values. Fraction of residues in segments of length at least 50 bp plotted against the p-value of that segment's score, for the branches of the phylogeny incident on zebrafish, chicken, mouse, and chimp. The zebrafish and chicken graphs are so close as to be nearly indistinguishable.
Figure 2
Figure 2
Sample suspicious zebrafish alignment. The MULTIZ alignment block at human coordinates chr1: 87,801,892-87,801,950, covering 59 bp of human sequence. Lower case letters indicate disagreement with the human sequence.
Figure 3
Figure 3
Pie charts showing the genomic distribution of suspicious regions for zebrafish and mouse. UTR, untranslated region.
Figure 4
Figure 4
Distribution of BLASTX E-values in coding regions. (a) Distribution of BLASTX E-values for ℜzebrafish and low discordance (≤10-10) zebrafish alignment regions intersecting human coding exons. The distribution is also plotted for random zebrafish nucleotide sequences. (b) The analogous distributions for mouse.
Figure 5
Figure 5
Percentage of frameshifted codons. Distribution of ℜzebrafish and low discordance (≤10-10) zebrafish alignment regions intersecting some annotated human coding exons, plotted against the percentage of frameshifted zebrafish codons (in the region aligned to the human exon).
Figure 6
Figure 6
Specificity and sensitivity for simulated data. Results averaged over 25 simulated data sets, each of size approximately 100 Kb, that are aligned by TBA. (a) Specificity with respect to the same species: for each suspicious region reported by StatSigMA-w, the histogram shows the percentage of its columns for which the species reported as suspicious is actually misaligned by TBA. (b) Specificity with respect to any species: for each suspicious region reported by StatSigMA-w, the histogram shows the percentage of its columns for which any species is actually misaligned by TBA. (c) Sensitivity with respect to the same species: for each misalignment region, the histogram shows the percentage of its columns for which StatSigMA-w reports a p-value at least 0.1 for a branch attributable to the misaligned species. (d) Sensitivity with respect to any species: for each misalignment region, the histogram shows the percentage of its columns for which StatSigMA-w reports a p-value at least 0.1 for any branch.

Similar articles

Cited by

References

    1. Kent W, Sugnet CW, Furey TS, Roskin K, Pringle TH, Zahler AM, Haussler D. The Human Genome Browser at UCSC. Genome Res. 2002;12:996–1006. 10.1101/gr.229102. Article published online before print in May 2002. - PMC - PubMed
    1. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. doi: 10.1126/science.1098119. - DOI - PubMed
    1. Cooper GM, Stone EA, Asimenos G, NISC Comparative Sequencing Program. Green ED, Batzoglou S, Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005;15:901–913. doi: 10.1101/gr.3577405. - DOI - PMC - PubMed
    1. Kolbe D, Taylor J, Elnitski L, Eswara P, Li J, Miller W, Hardison R, Chiaromonte F. Regulatory potential scores from genome-wide 3-way alignments of human, mouse and rat. Genome Res. 2004;14:700–707. doi: 10.1101/gr.1976004. - DOI - PMC - PubMed
    1. Margulies E, Blanchette M, NISC Comparative Sequencing Program. Haussler D, Green E. Identification and characterization of multi-species conserved sequences. Genome Res. 2003;13:2507–2518. doi: 10.1101/gr.1602203. - DOI - PMC - PubMed

Publication types

LinkOut - more resources