Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct 26:8:417.
doi: 10.1186/1471-2105-8-417.

How accurately is ncRNA aligned within whole-genome multiple alignments?

Affiliations

How accurately is ncRNA aligned within whole-genome multiple alignments?

Adrienne X Wang et al. BMC Bioinformatics. .

Abstract

Background: Multiple alignment of homologous DNA sequences is of great interest to biologists since it provides a window into evolutionary processes. At present, the accuracy of whole-genome multiple alignments, particularly in noncoding regions, has not been thoroughly evaluated.

Results: We evaluate the alignment accuracy of certain noncoding regions using noncoding RNA alignments from Rfam as a reference. We inspect the MULTIZ 17-vertebrate alignment from the UCSC Genome Browser for all the human sequences in the Rfam seed alignments. In particular, we find 638 instances of chimeric and partial alignments to human noncoding RNA elements, of which at least 225 can be improved by straightforward means. As a byproduct of our procedure, we predict many novel instances of known ncRNA families that are suggested by the alignment.

Conclusion: MULTIZ does a fairly accurate job of aligning these genomes in these difficult regions. However, our experiments indicate that better alignments exist in some regions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Seed alignment of the IRE (Iron Response Element family, RF00037). The first column provides the accession numbers of the sequences. The species can be found in the second column. The third column is the alignment. The last row in the seed alignment shows the predicted secondary structure of this family. The color blocks represent predicted base paired regions.
Figure 2
Figure 2
Overview of the first phase of the evaluation process.
Figure 3
Figure 3
A perfect alignment. This figure illustrates a perfect alignment of a human tRNA (RF00005) on chromosome 12. Red rectangles denote sequences with high covariance model scores in all figures. See Methods for further explanation of these figures.
Figure 4
Figure 4
A shifted element. This figure illustrates a case of large shift of an element aligned to human SNORD113 (C/D box small nucleolar RNA SNORD113/SNORD114, RF00181) on chromosome 14. A single RNA in elephant (loxAfr1.scaffold 44287) is divided into halves by the alignment, shown here as the very long red rectangle, with the left half aligned to the beginning of the second of the four human RF00181 elements shown, and the right half to the right end of the third human instance. In the MULTIZ alignment, this appears as though it were a 1556 bp deletion in elephant, spanning from the middle of the second human instance to the middle of the third.
Figure 5
Figure 5
Chimeric alignments. This figure illustrates two examples of chimeric alignments of a human SNORA25 (small nucleolar RNA SNORA25, RF00402) on chromosome 7. a. Two pieces of sequences from two different scaffolds (dasNov1.scaffold 192792 and dasNov1.scaffold 7495, the two rows directly below the red rectangles) in armadillo are concatenated and aligned to the human ncRNA. However, when we extract a longer sequence from the genome at the position of the first fragment, dasNov1.scaffold 192792, the aligned fragment can be extended to a member of the small nucleolar RNA SNORA25 family. b. Two pieces of sequences from two different chromosomes in mouse (mm8.chr13 and mm8.chr6, the next two rows below armadillo) are aligned to the same human ncRNA. The first fragment, if extended, is also a member in the family. Note that neither armadillo nor mouse show a red rectangle, since these chimeric alignments score below the covariance model threshold. The thin horizontal lines show which regions of that species are included in the alignment.
Figure 6
Figure 6
Partial alignments. This figure illustrates partial alignments of a human SNORA42 (small nucleolar RNA SNORA42, RF00406) on chromosome 14. The segments in rabbit (oryCun1.scaffold 201547), tenrec (echTel1.scaffold 205400) and elephant (loxAfr1.scaffold 38492) appear to be partially aligned to the human ncRNA. These segments all receive scores above the threshold for this family if extended. Note that the partially aligned sequences score below the threshold, as indicated by the absence of red rectangles. Note also that the cow sequence in this alignment (bosTau2.chr2) is categorized as a shifted element in Table 2. The thin horizontal lines show which regions of that species are included in the alignment.
Figure 7
Figure 7
Categorization of aligned segments by species. For each species, the bars show the percent of aligned segments for that species in each of the categories of Table 2. The number next to the species name is the number of alignments in which that species is included.

Similar articles

Cited by

References

    1. Batzoglou S. The many faces of sequence alignment. Briefings in Bioinformatics. 2005;6:6–22. doi: 10.1093/bib/6.1.6. - DOI - PubMed
    1. Kumar S, Filipski A. Multiple sequence alignment: in pursuit of homologous DNA positions. Genome Res. 2007;17:127–135. doi: 10.1101/gr.5232407. - DOI - PubMed
    1. Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A, Birney E, Keefe D, Schwartz AS, Hou M, Taylor J, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Brown JB, Bickel P, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Stone EA, Rosenbloom KR, Kent WJ, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VV, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffie DB, Chang JL, Lindblad-Toh K, Lander ES, Hinrichs A, Trumbower H, Clawson H, Zweig A, Kuhn RM, Barber G, Harte R, Karolchik D, Field MA, Moore RA, Matthewson CA, Schein JE, Marra MA, Antonarakis SE, Batzoglou S, Goldman N, Hardison R, Haussler D, Miller W, Pachter L, Green ED, Sidow A. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 2007;17:760–774. doi: 10.1101/gr.6034307. - DOI - PMC - PubMed
    1. Miller W. Comparison of genomic sequences: solved and unsolved problems. Bioinformatics. 2000;17:391–397. doi: 10.1093/bioinformatics/17.5.391. - DOI - PubMed
    1. Rivas E, Eddy SR. Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics. 2000;16:583–605. doi: 10.1093/bioinformatics/16.7.583. - DOI - PubMed

Publication types

Substances