Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep;15(9):1623-31.
doi: 10.1261/rna.1601409. Epub 2009 Jul 21.

The RNA structure alignment ontology

The RNA structure alignment ontology

James W Brown et al. RNA. 2009 Sep.

Abstract

Multiple sequence alignments are powerful tools for understanding the structures, functions, and evolutionary histories of linear biological macromolecules (DNA, RNA, and proteins), and for finding homologs in sequence databases. We address several ontological issues related to RNA sequence alignments that are informed by structure. Multiple sequence alignments are usually shown as two-dimensional (2D) matrices, with rows representing individual sequences, and columns identifying nucleotides from different sequences that correspond structurally, functionally, and/or evolutionarily. However, the requirement that sequences and structures correspond nucleotide-by-nucleotide is unrealistic and hinders representation of important biological relationships. High-throughput sequencing efforts are also rapidly making 2D alignments unmanageable because of vertical and horizontal expansion as more sequences are added. Solving the shortcomings of traditional RNA sequence alignments requires explicit annotation of the meaning of each relationship within the alignment. We introduce the notion of "correspondence," which is an equivalence relation between RNA elements in sets of sequences as the basis of an RNA alignment ontology. The purpose of this ontology is twofold: first, to enable the development of new representations of RNA data and of software tools that resolve the expansion problems with current RNA sequence alignments, and second, to facilitate the integration of sequence data with secondary and three-dimensional structural information, as well as other experimental information, to create simultaneously more accurate and more exploitable RNA alignments.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Abstract example of an RNA sequence alignment showing typical features. This simplified diagram shows many features common in sequence alignments, including representation of paired and unpaired regions, gaps, kinds of loops, etc. Some features can be conveniently represented using existing software. Others, such as noncanonical bases, cannot.
FIGURE 2.
FIGURE 2.
Example RNA sequence alignment. This example is helix P3 and the adjacent joining regions in RNase P RNA from representative Archaea. The first seven rows are annotations. Rows 1–4 are standard numbering, relative to the Methanothermobacter thermoautotrophicus RNA. Row 5 contains human-readable secondary structure labels. Columns are indicated in the second and third rows. Row 6 is the machine-readable base-pairing mask. Row 7 is a human-readable guide to the pairings specified in the previous row; column “A” pairs with “A,” “B” pairs with “B,” etc. The remaining rows are individual sequences; data taken from the RNase P Database (Brown 1999).
FIGURE 3.
FIGURE 3.
Example bacterial RNase P RNA secondary structures and correspondences. (A) The correspondence relationship between two conceptual RNA sequences; corresponding nucleotides (all that is possible in a traditional sequence alignment), corresponding regions, corresponding base pairs, and corresponding helices. (B) These types of relationships in the context of the secondary structure of RNase P RNA. Type B RNase P RNA is represented by that of Bacillus subtilus strain 168, and type A RNase P RNA is represented by that of Escherichia coli strain K12 W3110. Helices are numbered P1–P19 according to Haas et al. (1994). Taken from the RNase P Database (Brown 1999).
FIGURE 4.
FIGURE 4.
Example RNA sequence/structure alignment. This is the same alignment as shown in Figure 2 with explicit correspondence between nucleotides shown in blue and explicit correspondence between regions shown with red boxes. Correspondence relations between base pairs and helices are not displayed here. Note that indels (gaps) are not required.

References

    1. Andersen ES, Rosenblad MA, Larsen N, Westergaard JC, Burks J, Wower IK, Wower J, Gorodkin J, Samuelsson T, Zwieb C. The tmRDB and SRPDB resources. Nucleic Acids Res. 2006;34:D163–D168. - PMC - PubMed
    1. Bendana YR, Holmes IH. Colorstock, SScolor, Raton: RNA alignment visualization tools. Bioinformatics. 2008;24:579–580. - PMC - PubMed
    1. Brown JW. The Ribonuclease P Database. Nucleic Acids Res. 1999;27:314. doi: 10.1093/nar/27.1.314. - DOI - PMC - PubMed
    1. Burke JM, Belfort M, Cech TR, Davies RW, Schweyen RJ, Shub DA, Szostak JW, Tabak HF. Structural conventions for group I introns. Nucleic Acids Res. 1987;15:7217–7221. - PMC - PubMed
    1. Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M. The Sequence Ontology: A tool for the unification of genome annotations. Genome Biol. 2005;6:R44. doi: 10.1186/gb-2005-6-5-r44. - DOI - PMC - PubMed

Publication types

LinkOut - more resources