Review

. 2021 Jan-Jun:296:100555.

doi: 10.1016/j.jbc.2021.100555. Epub 2021 Mar 18.

An RNA-centric historical narrative around the Protein Data Bank

Eric Westhof¹, Neocles B Leontis²

Affiliations

¹ Institut de Biologie Moléculaire et Cellulaire du CNRS, Architecture et Réactivité de l'ARN, Université de Strasbourg, Strasbourg, France. Electronic address: E.Westhof@ibmc-cnrs.unistra.fr.
² Department of Chemistry, Bowling Green State University, Bowling Green, Ohio, USA.

PMID: 33744291
PMCID: PMC8080527
DOI: 10.1016/j.jbc.2021.100555

Review

An RNA-centric historical narrative around the Protein Data Bank

Eric Westhof et al. J Biol Chem. 2021 Jan-Jun.

. 2021 Jan-Jun:296:100555.

doi: 10.1016/j.jbc.2021.100555. Epub 2021 Mar 18.

Authors

Eric Westhof¹, Neocles B Leontis²

Affiliations

¹ Institut de Biologie Moléculaire et Cellulaire du CNRS, Architecture et Réactivité de l'ARN, Université de Strasbourg, Strasbourg, France. Electronic address: E.Westhof@ibmc-cnrs.unistra.fr.
² Department of Chemistry, Bowling Green State University, Bowling Green, Ohio, USA.

PMID: 33744291
PMCID: PMC8080527
DOI: 10.1016/j.jbc.2021.100555

Abstract

Some of the amazing contributions brought to the scientific community by the Protein Data Bank (PDB) are described. The focus is on nucleic acid structures with a bias toward RNA. The evolution and key roles in science of the PDB and other structural databases for nucleic acids illustrate how small initial ideas can become huge and indispensable resources with the unflinching willingness of scientists to cooperate globally. The progress in the understanding of the molecular interactions driving RNA architectures followed the rapid increase in RNA structures in the PDB. That increase was consecutive to improvements in chemical synthesis and purification of RNA molecules, as well as in biophysical methods for structure determination and computer technology. The RNA modeling efforts from the early beginnings are also described together with their links to the state of structural knowledge and technological development. Structures of RNA and of its assemblies are physical objects, which, together with genomic data, allow us to integrate present-day biological functions and the historical evolution in all living species on earth.

Keywords: Protein Data Bank; RNA; computational biology; databases; modeling; structural biology.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare that they have no conflicts of interest with the contents of this article.

Figures

**Figure 1**
**The evolution of the number of RNA structures in the PDB.** The figure is downloaded from the option “Analyze PDB statistics.” All RNA structures are included (from X-ray, NMR, and cryo-EM). Some key X-ray structures are indicated. Up to 1991, only tRNA structures were present. Following time, these structures are highlighted: (1) A synthetic 14-mer duplex, 1RNA (43); (2) the core hammerhead ribozyme, 1MME (189); (3) the P4-P6 domain of the *Tetrahymena* ribozyme, 1GID (65); (4) the eukaryotic loop E structure, 354D (191); (5) the core of the *Tetrahymena* ribozyme, 1GRZ (200); (6) the hepatitis delta ribozyme, 1DRZ (192); (7) Aptamers binding to malachite green, 1F1T (201), and vitamin B12, 1DDY (202); (8) RNA quadruplex, 1J8G (203); an earlier NMR structure was solved before, 1RAU (204); (9) the purine riboswitch, 1Y27 (119), since then the structures of a great variety of riboswitches have appeared (117, 120, 205); (10) the group I intron from Azoarcus, 1U6B (206); (11) the core of a RNase P ribozyme, 2A2E (207); (12) the full hammerhead ribozyme with long-range loop–loop contacts stabilizing the core, 3ZD5 (208); (13) a complete group II intron, 3EOH (101); (14) the structure of a RNA nanosquare, 3P59 (209); (15) the complex between the T-box riboswitch and its tRNA target, 4MGN (210); (16) the TYMV tRNA-like, 4P5J (211); (17) the Spinach fluorescent aptamer, 4TS0 (113); (18) a group II intron with a lariat primed for transposition, 5J01 (102); (19) the full structure of the T-box between GlyQS and its tRNA, 6POM (212).

**Figure 2**
**The evolution of the number of structures of RNA–protein complexes (RNPs) in the PDB.** The figure is downloaded from the option “Analyze PDB statistics.” All RNP structures are included (from X-ray, NMR, and cryo-EM). The number of structures related to ribosomes and its cofactors is much too large to show them on such a figure. We preferred to emphasize the complexes formed in the spliceosome (for detailed reviews, see (99, 213)). Most of the large RNP structures after 2015 are based on cryo-EM data. Following time, these structures are highlighted: (1) RNA viruses, 1BMV (39); (2) class I tRNA synthetase complex, 1GSG (214); (3) tRNA^Ser, a class II tRNA with a long variable loop, complexed with its specific synthetase, 1SER (215); (4) class II tRNA synthetase complex, 1ASY (216); (5) MS2 RNA coat protein, 1AQ3 (217); (6) spliceosomal U2 complex, 1A9N (218); (7) the kink-turn was first observed in the complex of U4 sRNA fragment, 1E7K (219), before being recurrently observed in the ribosome structure (193); (8) the Signal Recognition Particle complex, 2V3C (220); (9) complex between a tyrosyl tRNA synthetase and a group I intron, 2RKJ (221); (10) U1 snRNP, 3CW1 (222); (11) an RNAse P holoenzyme, 3Q1Q (223); (12) in the U4 snRNP, 4WZJ (224); (13) Lsm/U6 snRNP complex, 4M7A (225); (14) the tri-snRNP structure, 3JCM (226); (15) Intron-lariat complex, 3JB9 (227); (16) B^act complex, 5GM6 (228), C-complex, 5GMK (229), 5LJ3 (230); (17) C∗-complex, 5WSG (231), 5MPS (232); P-complex, 5YLZ (233), 6EXN (234), 6BK8 (235).

**Figure 3**
**Three representations of the interactions present between nucleotides in transfer RNA with increasing levels of structural complexity.**A, standard cloverleaf structure of yeast tRNA^Asp (236).B, a two-dimensional view of the tertiary structure of yeast tRNA^Asp, it follows the representation proposed by Kim (237) that stresses the two main arms made of helical stems, the acceptor-stem with the Thymine (T)-stem and of the Dihydrouridine (D)-stem with the anticodon stem. A stem capped by a loop is called a hairpin. The numbering follows that of yeast tRNA^Phe and because the numbers of nucleotides are not the same in the D- and variable-loops, residues 17 and 47 are skipped and the residue following D20 is C20a. The representation clearly shows the contacts linking the T- and D-loops and the tertiary base pairs and triples between the single-stranded segments and the D-hairpin. The contacts represented correspond to those observed in the yeast tRNA^Asp structure (238, 239). For characterizing the tertiary pairs, the following nomenclature is used (240). Nucleic acid bases can interact through three possible edges: the Watson–Crick edge, the Hoogsteen edge (the edge with N7 in purines or C5 in pyrimidines), and the sugar edge (O2 in pyridines or N3 and N2 in purines, with often the hydroxyl O2’ of the ribose). The nucleotides can interact with the sugars on the same side of the H-bonds (like in normal Watson–Crick pairs) and the pair is called *cis*; or on opposite sides and the pair is called *trans*. The three symbols, *circle*, *square*, *triangle*, represent respectively the Watson–Crick, the Hoogsteen, and the sugar edges. When the pair is *cis*, the symbols are *dark* and, when in *trans*, they are *white*. This nomenclature applies to the large number of specific base–base interactions. Pairs form through single H-bond (see Fig. 4F) or bifurcated H-bonds (see Fig. 4B) are not easily annotated. C, the tertiary structure of yeast tRNA^Asp with the four domains colored (*green*: acceptor stem; *yellow*; T-hairpin; *blue*: D-hairpin; *red*; anticodon hairpin; the two nucleotides U8 and R9 (generally a purine) linking the 5’-end acceptor strand to the D-strand is *magenta*; the variable loop linking the 3’-end of the anticodon hairpin to the 5’-end T-strand is *orange*). The *double arrows* (*green* and *red*) indicate the sets of helices that stack upon each other in a coaxial manner in the three-dimensional fold. Capital letters indicate the position of the contacts shown in Figure 4.

**Figure 4**
**Illustrations of contacts discussed in the text.**A, the U-turn after U33 in the anticodon loop: the torsion angle about P-O5’ of the 5’-phosphate of residue 34 is *trans* (180°) instead of the usual *gauche-minus* (−60°); the 5’-phosphate of residue 35 stacks below U33; there is a H-bond between N3-H of U33 and an anionic phosphate oxygen of the 5’phosphate of residue 36. Thus, all three residues of the anticodon triplet have some interaction with the highly conserved U33 (in mammalian initiator tRNA^Met, C33 occurs). B, the equivalent U-turn in the T-loop where the U is a pseudouridine (noted Ψ or Psi). The bifurcated pair Ψ55oG18 in which the O4(Ψ) interacts with both N1-H and N2-H of the guanine is also shown. The residue A58 stacks above G18. C, interdigitated nucleotides between the D- and T-loops. D, the highly conserved *trans* Watson–Crick/Hoogsteen pair between U8 and A14 forms three H-bonds with A21, also highly conserved. E, the famous *trans* Watson–Crick/Watson–Crick pair between R15 and Y48. Levitt (141) noted that residue 15 is always a purine and residue always a pyrimidine and modeled that pair as a regular *cis* Watson–Crick/Watson–Crick. Notice that, in standard nucleotide conformations, *trans* base pairs lead to parallel strands and not antiparallel strands as in usual helices (241). F, nucleotides 32 and 38 immediately adjacent to the last base pair of the anticodon stem generally present a single H-bond (for details (242)).

**Figure 5**
**A native state of the hammerhead ribozyme (*left drawing*) that promotes cleavage at low magnesium concentrations** (243) **that is reached through intricate contacts (and variable depending on the type of ribozymes) between an apical loop and an internal loop (drawn in *red color*).** Without these tertiary contacts, a different core of the three-way junction (drawn in *magenta* in both structures) is observed (compare the regions in *magenta color*). The nomenclature described in Figure 3B is used for the non-Watson–Crick pairs (240).

**Figure 6**
**From *left* to *right*, a two-dimensional view of the P4–P6 domain with the nomenclature described in**Figure 3B (240) **and, with next to it, the whole molecule shown in space-filling mode highlighting the coaxial stacking of helices and their parallel packing.** In the *left* two-dimensional view, the region boxed in *green* shows how the RNA is able to bend 180° and the regions boxed in *red* are shown in space-filling and atomic views on the *right* of the figure; these views show the precise and tight contact with the tetraloop GAAA and its receptor, called the 11-nt motif; notice that there are twice as many H-bonds between the hydroxyl groups (*red dotted lines*) than between the bases (*black dotted lines*). That type of RNA–RNA contacts were discovered through sequence analysis and SELEX experiments (244).

**Figure 7**
*Top*, a timeline of RNA models. Before 1987, no sets of coordinates for the suggested models were made available. With the development of modeling tools based on computer graphics, one could derive coordinates (without manually building physical models) and refine them (see *bottom*). Along time, the following RNA models are highlighted: (1) the anticodon loop (138); (2) the Levitt tRNA (141); (3) the pseudoknot fold (143); (4) the Kim & Cech core of group I intron (144); (5) the tRNA-like in TYMV (245); (6) the GNRA loop in 5S rRNA (246); (7) the Michel & Westhof core of group I intron (135); (7) the 4-way junction of U1 snRNA (188); (8) the tRNA selenocysteine (247); (9) the hammerhead (248) and the hepatitis delta (249) ribozymes; (10) full group I introns (250); (11) A and B families of RNase P (251); (12) the *Azoarcus* group I intron (252). Many RNA structures were also modeled afterward, especially within the RNA-Puzzles Consortium (155, 156, 157, 158). *Bottom*, a timeline of some RNA assembly and computing tools. The most recent ones are regularly used and actively improved. (1) FRODO developed by Alwyn Jones was a pioneering tool in computer molecular graphics (151); (2) NUCLIN-NUCLSQ (159), an inclusive refinement program dedicated to nucleic acids and based on Hendrickson–Konnert PROLSQ (253, 254); (3) MIDAS (255, 256); (4) MC-Sym (257); (5) MANIP (258); (6) Chimera (259); (7) S2S (260, 261); (8) FARFAR (262); (9) MC-Fold (263), iFoldRNA (264); (10) ModeRNA (265); (11) RNAComposer (266); 3dRNA (267); (12) VFold (268); (13) SimRNA (269).

See this image and copyright information in PMC

References

1. Sundaralingam M., Jensen L.H. Stereochemistry of nucleic acid constituents: I. Refinement of the structure of cytidylic acid b. J. Mol. Biol. 1965;13:914–929.
1. Kennard O., Speakman J.C., Donnay J.D.H. Primary crystallographic data. Acta Cryst. 1967;22:445–449.
1. Rubin J., Brennan T., Sundaralingam M. Crystal structure of a naturally occurring dinucleoside monophosphate: Uridylyl (3',5') adenosine hemihydrate. Science. 1971;174:1020–1022. - PubMed
1. Seeman N.C., Sussman J.L., Berman H.N., Kim S.H. Nucleic acid conformation: Crystal structure of a naturally occurring dinucleoside phosphate (UpA) Nat. New Biol. 1971;233:90–92. - PubMed
1. Kennard O.A., Brice F.H., Hummelink M.D., Motherwell T.W.A., Roidgers W.D.S., Watson J.R., D.G. Computer based systems for the retrieval of data: Crystallography. Pure Appl. Chem. 1977;49:1807–1816.

Publication types

Actions

MeSH terms

Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An RNA-centric historical narrative around the Protein Data Bank

Affiliations

An RNA-centric historical narrative around the Protein Data Bank

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources