Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan-Jun:296:100555.
doi: 10.1016/j.jbc.2021.100555. Epub 2021 Mar 18.

An RNA-centric historical narrative around the Protein Data Bank

Affiliations
Review

An RNA-centric historical narrative around the Protein Data Bank

Eric Westhof et al. J Biol Chem. 2021 Jan-Jun.

Abstract

Some of the amazing contributions brought to the scientific community by the Protein Data Bank (PDB) are described. The focus is on nucleic acid structures with a bias toward RNA. The evolution and key roles in science of the PDB and other structural databases for nucleic acids illustrate how small initial ideas can become huge and indispensable resources with the unflinching willingness of scientists to cooperate globally. The progress in the understanding of the molecular interactions driving RNA architectures followed the rapid increase in RNA structures in the PDB. That increase was consecutive to improvements in chemical synthesis and purification of RNA molecules, as well as in biophysical methods for structure determination and computer technology. The RNA modeling efforts from the early beginnings are also described together with their links to the state of structural knowledge and technological development. Structures of RNA and of its assemblies are physical objects, which, together with genomic data, allow us to integrate present-day biological functions and the historical evolution in all living species on earth.

Keywords: Protein Data Bank; RNA; computational biology; databases; modeling; structural biology.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare that they have no conflicts of interest with the contents of this article.

Figures

Figure 1
Figure 1
The evolution of the number of RNA structures in the PDB. The figure is downloaded from the option “Analyze PDB statistics.” All RNA structures are included (from X-ray, NMR, and cryo-EM). Some key X-ray structures are indicated. Up to 1991, only tRNA structures were present. Following time, these structures are highlighted: (1) A synthetic 14-mer duplex, 1RNA (43); (2) the core hammerhead ribozyme, 1MME (189); (3) the P4-P6 domain of the Tetrahymena ribozyme, 1GID (65); (4) the eukaryotic loop E structure, 354D (191); (5) the core of the Tetrahymena ribozyme, 1GRZ (200); (6) the hepatitis delta ribozyme, 1DRZ (192); (7) Aptamers binding to malachite green, 1F1T (201), and vitamin B12, 1DDY (202); (8) RNA quadruplex, 1J8G (203); an earlier NMR structure was solved before, 1RAU (204); (9) the purine riboswitch, 1Y27 (119), since then the structures of a great variety of riboswitches have appeared (117, 120, 205); (10) the group I intron from Azoarcus, 1U6B (206); (11) the core of a RNase P ribozyme, 2A2E (207); (12) the full hammerhead ribozyme with long-range loop–loop contacts stabilizing the core, 3ZD5 (208); (13) a complete group II intron, 3EOH (101); (14) the structure of a RNA nanosquare, 3P59 (209); (15) the complex between the T-box riboswitch and its tRNA target, 4MGN (210); (16) the TYMV tRNA-like, 4P5J (211); (17) the Spinach fluorescent aptamer, 4TS0 (113); (18) a group II intron with a lariat primed for transposition, 5J01 (102); (19) the full structure of the T-box between GlyQS and its tRNA, 6POM (212).
Figure 2
Figure 2
The evolution of the number of structures of RNA–protein complexes (RNPs) in the PDB. The figure is downloaded from the option “Analyze PDB statistics.” All RNP structures are included (from X-ray, NMR, and cryo-EM). The number of structures related to ribosomes and its cofactors is much too large to show them on such a figure. We preferred to emphasize the complexes formed in the spliceosome (for detailed reviews, see (99, 213)). Most of the large RNP structures after 2015 are based on cryo-EM data. Following time, these structures are highlighted: (1) RNA viruses, 1BMV (39); (2) class I tRNA synthetase complex, 1GSG (214); (3) tRNASer, a class II tRNA with a long variable loop, complexed with its specific synthetase, 1SER (215); (4) class II tRNA synthetase complex, 1ASY (216); (5) MS2 RNA coat protein, 1AQ3 (217); (6) spliceosomal U2 complex, 1A9N (218); (7) the kink-turn was first observed in the complex of U4 sRNA fragment, 1E7K (219), before being recurrently observed in the ribosome structure (193); (8) the Signal Recognition Particle complex, 2V3C (220); (9) complex between a tyrosyl tRNA synthetase and a group I intron, 2RKJ (221); (10) U1 snRNP, 3CW1 (222); (11) an RNAse P holoenzyme, 3Q1Q (223); (12) in the U4 snRNP, 4WZJ (224); (13) Lsm/U6 snRNP complex, 4M7A (225); (14) the tri-snRNP structure, 3JCM (226); (15) Intron-lariat complex, 3JB9 (227); (16) Bact complex, 5GM6 (228), C-complex, 5GMK (229), 5LJ3 (230); (17) C∗-complex, 5WSG (231), 5MPS (232); P-complex, 5YLZ (233), 6EXN (234), 6BK8 (235).
Figure 3
Figure 3
Three representations of the interactions present between nucleotides in transfer RNA with increasing levels of structural complexity.A, standard cloverleaf structure of yeast tRNAAsp (236).B, a two-dimensional view of the tertiary structure of yeast tRNAAsp, it follows the representation proposed by Kim (237) that stresses the two main arms made of helical stems, the acceptor-stem with the Thymine (T)-stem and of the Dihydrouridine (D)-stem with the anticodon stem. A stem capped by a loop is called a hairpin. The numbering follows that of yeast tRNAPhe and because the numbers of nucleotides are not the same in the D- and variable-loops, residues 17 and 47 are skipped and the residue following D20 is C20a. The representation clearly shows the contacts linking the T- and D-loops and the tertiary base pairs and triples between the single-stranded segments and the D-hairpin. The contacts represented correspond to those observed in the yeast tRNAAsp structure (238, 239). For characterizing the tertiary pairs, the following nomenclature is used (240). Nucleic acid bases can interact through three possible edges: the Watson–Crick edge, the Hoogsteen edge (the edge with N7 in purines or C5 in pyrimidines), and the sugar edge (O2 in pyridines or N3 and N2 in purines, with often the hydroxyl O2’ of the ribose). The nucleotides can interact with the sugars on the same side of the H-bonds (like in normal Watson–Crick pairs) and the pair is called cis; or on opposite sides and the pair is called trans. The three symbols, circle, square, triangle, represent respectively the Watson–Crick, the Hoogsteen, and the sugar edges. When the pair is cis, the symbols are dark and, when in trans, they are white. This nomenclature applies to the large number of specific base–base interactions. Pairs form through single H-bond (see Fig. 4F) or bifurcated H-bonds (see Fig. 4B) are not easily annotated. C, the tertiary structure of yeast tRNAAsp with the four domains colored (green: acceptor stem; yellow; T-hairpin; blue: D-hairpin; red; anticodon hairpin; the two nucleotides U8 and R9 (generally a purine) linking the 5’-end acceptor strand to the D-strand is magenta; the variable loop linking the 3’-end of the anticodon hairpin to the 5’-end T-strand is orange). The double arrows (green and red) indicate the sets of helices that stack upon each other in a coaxial manner in the three-dimensional fold. Capital letters indicate the position of the contacts shown in Figure 4.
Figure 4
Figure 4
Illustrations of contacts discussed in the text.A, the U-turn after U33 in the anticodon loop: the torsion angle about P-O5’ of the 5’-phosphate of residue 34 is trans (180°) instead of the usual gauche-minus (−60°); the 5’-phosphate of residue 35 stacks below U33; there is a H-bond between N3-H of U33 and an anionic phosphate oxygen of the 5’phosphate of residue 36. Thus, all three residues of the anticodon triplet have some interaction with the highly conserved U33 (in mammalian initiator tRNAMet, C33 occurs). B, the equivalent U-turn in the T-loop where the U is a pseudouridine (noted Ψ or Psi). The bifurcated pair Ψ55oG18 in which the O4(Ψ) interacts with both N1-H and N2-H of the guanine is also shown. The residue A58 stacks above G18. C, interdigitated nucleotides between the D- and T-loops. D, the highly conserved trans Watson–Crick/Hoogsteen pair between U8 and A14 forms three H-bonds with A21, also highly conserved. E, the famous trans Watson–Crick/Watson–Crick pair between R15 and Y48. Levitt (141) noted that residue 15 is always a purine and residue always a pyrimidine and modeled that pair as a regular cis Watson–Crick/Watson–Crick. Notice that, in standard nucleotide conformations, trans base pairs lead to parallel strands and not antiparallel strands as in usual helices (241). F, nucleotides 32 and 38 immediately adjacent to the last base pair of the anticodon stem generally present a single H-bond (for details (242)).
Figure 5
Figure 5
A native state of the hammerhead ribozyme (left drawing) that promotes cleavage at low magnesium concentrations (243) that is reached through intricate contacts (and variable depending on the type of ribozymes) between an apical loop and an internal loop (drawn in red color). Without these tertiary contacts, a different core of the three-way junction (drawn in magenta in both structures) is observed (compare the regions in magenta color). The nomenclature described in Figure 3B is used for the non-Watson–Crick pairs (240).
Figure 6
Figure 6
From left to right, a two-dimensional view of the P4–P6 domain with the nomenclature described inFigure 3B (240) and, with next to it, the whole molecule shown in space-filling mode highlighting the coaxial stacking of helices and their parallel packing. In the left two-dimensional view, the region boxed in green shows how the RNA is able to bend 180° and the regions boxed in red are shown in space-filling and atomic views on the right of the figure; these views show the precise and tight contact with the tetraloop GAAA and its receptor, called the 11-nt motif; notice that there are twice as many H-bonds between the hydroxyl groups (red dotted lines) than between the bases (black dotted lines). That type of RNA–RNA contacts were discovered through sequence analysis and SELEX experiments (244).
Figure 7
Figure 7
Top, a timeline of RNA models. Before 1987, no sets of coordinates for the suggested models were made available. With the development of modeling tools based on computer graphics, one could derive coordinates (without manually building physical models) and refine them (see bottom). Along time, the following RNA models are highlighted: (1) the anticodon loop (138); (2) the Levitt tRNA (141); (3) the pseudoknot fold (143); (4) the Kim & Cech core of group I intron (144); (5) the tRNA-like in TYMV (245); (6) the GNRA loop in 5S rRNA (246); (7) the Michel & Westhof core of group I intron (135); (7) the 4-way junction of U1 snRNA (188); (8) the tRNA selenocysteine (247); (9) the hammerhead (248) and the hepatitis delta (249) ribozymes; (10) full group I introns (250); (11) A and B families of RNase P (251); (12) the Azoarcus group I intron (252). Many RNA structures were also modeled afterward, especially within the RNA-Puzzles Consortium (155, 156, 157, 158). Bottom, a timeline of some RNA assembly and computing tools. The most recent ones are regularly used and actively improved. (1) FRODO developed by Alwyn Jones was a pioneering tool in computer molecular graphics (151); (2) NUCLIN-NUCLSQ (159), an inclusive refinement program dedicated to nucleic acids and based on Hendrickson–Konnert PROLSQ (253, 254); (3) MIDAS (255, 256); (4) MC-Sym (257); (5) MANIP (258); (6) Chimera (259); (7) S2S (260, 261); (8) FARFAR (262); (9) MC-Fold (263), iFoldRNA (264); (10) ModeRNA (265); (11) RNAComposer (266); 3dRNA (267); (12) VFold (268); (13) SimRNA (269).
None
Neocles B. Leontis (1955–2020)

References

    1. Sundaralingam M., Jensen L.H. Stereochemistry of nucleic acid constituents: I. Refinement of the structure of cytidylic acid b. J. Mol. Biol. 1965;13:914–929.
    1. Kennard O., Speakman J.C., Donnay J.D.H. Primary crystallographic data. Acta Cryst. 1967;22:445–449.
    1. Rubin J., Brennan T., Sundaralingam M. Crystal structure of a naturally occurring dinucleoside monophosphate: Uridylyl (3',5') adenosine hemihydrate. Science. 1971;174:1020–1022. - PubMed
    1. Seeman N.C., Sussman J.L., Berman H.N., Kim S.H. Nucleic acid conformation: Crystal structure of a naturally occurring dinucleoside phosphate (UpA) Nat. New Biol. 1971;233:90–92. - PubMed
    1. Kennard O.A., Brice F.H., Hummelink M.D., Motherwell T.W.A., Roidgers W.D.S., Watson J.R., D.G. Computer based systems for the retrieval of data: Crystallography. Pure Appl. Chem. 1977;49:1807–1816.