Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jan;56(1-2):215-52.
doi: 10.1007/s00285-007-0110-x. Epub 2007 Aug 11.

FR3D: finding local and composite recurrent structural motifs in RNA 3D structures

Affiliations

FR3D: finding local and composite recurrent structural motifs in RNA 3D structures

Michael Sarver et al. J Math Biol. 2008 Jan.

Abstract

New methods are described for finding recurrent three-dimensional (3D) motifs in RNA atomic-resolution structures. Recurrent RNA 3D motifs are sets of RNA nucleotides with similar spatial arrangements. They can be local or composite. Local motifs comprise nucleotides that occur in the same hairpin or internal loop. Composite motifs comprise nucleotides belonging to three or more different RNA strand segments or molecules. We use a base-centered approach to construct efficient, yet exhaustive search procedures using geometric, symbolic, or mixed representations of RNA structure that we implement in a suite of MATLAB programs, "Find RNA 3D" (FR3D). The first modules of FR3D preprocess structure files to classify base-pair and -stacking interactions. Each base is represented geometrically by the position of its glycosidic nitrogen in 3D space and by the rotation matrix that describes its orientation with respect to a common frame. Base-pairing and base-stacking interactions are calculated from the base geometries and are represented symbolically according to the Leontis/Westhof basepairing classification, extended to include base-stacking. These data are stored and used to organize motif searches. For geometric searches, the user supplies the 3D structure of a query motif which FR3D uses to find and score geometrically similar candidate motifs, without regard to the sequential position of their nucleotides in the RNA chain or the identity of their bases. To score and rank candidate motifs, FR3D calculates a geometric discrepancy by rigidly rotating candidates to align optimally with the query motif and then comparing the relative orientations of the corresponding bases in the query and candidate motifs. Given the growing size of the RNA structure database, it is impossible to explicitly compute the discrepancy for all conceivable candidate motifs, even for motifs with less than ten nucleotides. The screening algorithm that we describe finds all candidate motifs whose geometric discrepancy with respect to the query motif falls below a user-specified cutoff discrepancy. This technique can be applied to RMSD searches. Candidate motifs identified geometrically may be further screened symbolically to identify those that contain particular basepair types or base-stacking arrangements or that conform to sequence continuity or nucleotide identity constraints. Purely symbolic searches for motifs containing user-defined sequence, continuity and interaction constraints have also been implemented. We demonstrate that FR3D finds all occurrences, both local and composite and with nucleotide substitutions, of sarcin/ricin and kink-turn motifs in the 23S and 5S ribosomal RNA 3D structures of the H. marismortui 50S ribosomal subunit and assigns the lowest discrepancy scores to bona fide examples of these motifs. The search algorithms have been optimized for speed to allow users to search the non-redundant RNA 3D structure database on a personal computer in a matter of minutes.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Reference bases. The geometric center of each base, marked by a black dot, is used as the origin of its local coordinate system. For each base, the 3′ face is shown. Hydrogen atoms are marked with gray lines. The axes are marked in Ångstroms
Fig. 2
Fig. 2
Relative location in 65 AA basepairs extracted from PDB files 1s72 and 1j5e by the classification module. Each pair is rigidly translated and rotated so that the first A coincides with the A at the origin; the glycosidic nitrogen of the second A is shown as a colored dot. Each of the basepairing categories is colored with a different color. Boxes indicate cutoffs for each category. The axes are marked in Ångstroms
Fig. 3
Fig. 3
Relative orientation of 65 AA basepairs from PDB files 1s72 and 1j5e. The normal vector indicates whether the two A’s share the same or opposite orientation in the plane of the pair. The angle of rotation (in degrees) is measured after the bases have been given the correct orientation. Boxes indicate cutoffs for each category
Fig. 4
Fig. 4
a Query motif, part of the kink-turn in Helix 7 of H. marismortui 23S rRNA (Kt-7, PDB file 1s72). The geometric center of each base is marked by a black dot. b Query motif (blue) with candidate motif (red) superimposed. The candidate motif is from Helix 15 of the same molecule
Fig. 5
Fig. 5
Base centers and joining line segments for four bases belonging to the query motif from Fig. 4 (in blue) superposed on those of a candidate motif (in red). The geometric centers of the bases are indicated by dots
Fig. 6
Fig. 6
Adding a fourth nucleotide, G264, to a three-nucleotide partial candidate, A247-G249-C260, in the screening algorithm. The black lines indicate the new pair distances checked by the pairwise screening criterion
Fig. 7
Fig. 7
Annotated secondary structures of query motif from PDB file 1s72 for sarcin/ricin geometric search (a) with one bona fide candidate motif (b) and the highest scoring related motif (c). Yellow letters indicate the bases of the query motif used for the geometric search reported in Table 1 and the corresponding bases of the candidate motifs
Fig. 8
Fig. 8
Annotated secondary structures of query motif (a) and two candidate motifs for the seven-base sarcin/ricin motif search reported in Table 2. Yellow letters indicate the bases of the query motif used for the geometric search and the corresponding bases of the candidate motifs. The bold red box shows the constrained basepair. Candidate motifs obtained by the search include a bona fide composite sarcin/ricin motif (b) and a redundant motif (c) in which base A2703 in the query motif is mismatched to base C478 in the candidate motif. The higher-scoring version of this candidate, in which A477 is matched with A2703, had discrepancy 0.2161
Fig. 9
Fig. 9
Annotated secondary structure of nine-nucleotide sarcin/Ricin query motif (a) reported in Table 3 with one bona fide composite candidate motif (b) and the highest-scoring related motif, which differs from the query motif only at nucleotides U2690 and C2704 (c). The bold red box shows the constrained basepair
Fig. 10
Fig. 10
Query motif for kink-turn search reported in Table 4 (a) and new composite kink-turn motif identified by this search (b). Query motif for kink-turn search reported in Table 5 (c), and a composite kink-turn obtained by this search (d). The bold red box shows the constrained basepair
Fig. 11
Fig. 11
Query motif used for GNRA searche (a), GNRA motif with a bulged nucleotide which was correctly identified (b), related hairpin which is intermediate between GNRA and a T-loop (c). The bold red box shows the constrained basepair, and the green boxes indicate stacking constraints
Fig. 12
Fig. 12
Total search time for five geometric searches as a function of discrepancy cutoff. The search labeled GEOM is purely geometric. Other searches are identical to the previous search with the exception of one added constraint as indicated: SEQ sequential constraint; 1 BP one basepair constraint; 2 BP two basepair constraints; 3 BP three basepair constraints. The PDB file 1s72 was searched. Note the logarithmic scale on the vertical axis
Fig. 13
Fig. 13
Number of distinct candidates satisfying the discrepancy limit for the five searches in Fig. 12. The PDB file 1s72 was searched
Fig. 14
Fig. 14
Search times for sarcin/ricin query motifs with 5, 6, 7, 8, and 9 nucleotides, as a function of discrepancy cutoff value. The PDB file 1s72 was a mixed geometric and symbolic search with two basepair interaction constraints
Fig. 15
Fig. 15
Number of candidates remaining after screening (S), after the discrepancy calculation (D), and after redundant candidates were removed (R), as a function of discrepancy cutoff value. The query motif was the nine-nucleotide sarcin/ricin motif and PDB file 1s72 was searched

Similar articles

Cited by

References

    1. Adams PL, Stahley MR, Kosek AB, Wang J, Strobel SA. Crystal structure of a self-splicing group I intron with both exons. Nature. 2004;430(6995):45–50. - PubMed
    1. Babcock MS, Pednaul TEP, Olson WK. Nucleic acid structure analysis. mathematics for local Cartesian and helical structure parameters that are truly comparable between structures. J. Mol. Biol. 1994;237(1):125–156. - PubMed
    1. Ban N, Nissen P, Hansen J, Moore PB, Steitz TA. The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science. 2000;289(5481):905–920. - PubMed
    1. Bayley MJ, Gardiner EJ, Willett P, Artymiuk PJ. A fourier fingerprint-based method for protein surface representation. J. Chem. Inf. Model. 2005;45(3):696–707. - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235–242. - PMC - PubMed

LinkOut - more resources