Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 19;48(11):6367-6381.
doi: 10.1093/nar/gkaa383.

A unified dinucleotide alphabet describing both RNA and DNA structures

Affiliations

A unified dinucleotide alphabet describing both RNA and DNA structures

Jiří Černý et al. Nucleic Acids Res. .

Abstract

By analyzing almost 120 000 dinucleotides in over 2000 nonredundant nucleic acid crystal structures, we define 96+1 diNucleotide Conformers, NtCs, which describe the geometry of RNA and DNA dinucleotides. NtC classes are grouped into 15 codes of the structural alphabet CANA (Conformational Alphabet of Nucleic Acids) to simplify symbolic annotation of the prominent structural features of NAs and their intuitive graphical display. The search for nontrivial patterns of NtCs resulted in the identification of several types of RNA loops, some of them observed for the first time. Over 30% of the nearly six million dinucleotides in the PDB cannot be assigned to any NtC class but we demonstrate that up to a half of them can be re-refined with the help of proper refinement targets. A statistical analysis of the preferences of NtCs and CANA codes for the 16 dinucleotide sequences showed that neither the NtC class AA00, which forms the scaffold of RNA structures, nor BB00, the DNA most populated class, are sequence neutral but their distributions are significantly biased. The reported automated assignment of the NtC classes and CANA codes available at dnatco.org provides a powerful tool for unbiased analysis of nucleic acid structures by structural and molecular biologists.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Examples of two-dimensional scattergrams of three backbone torsion angles in RNA and DNA molecules. Shown are the values from crystal structures with resolution better than 1.8 Å. The scattergram on the left plots distributions of the torsions at the backbone bonds P-O5′ (axis α2) and C5′–C4′ (axis ɣ2), the scattergram on the right distributions of the torsions at the bonds P-O5′ (axis α2) and O3′-P (axis ζ1).
Figure 2.
Figure 2.
The analyzed fragment is defined by twelve geometric parameters: seven backbone torsion angles δ1 to δ2, which are highlighted in cyan, plus two torsions around the glycosidic bonds χ1and χ2 (highlighted in green), plus three parameters highlighted in light blue, one pseudo-torsion angle μ, and two distances NN and C’C’. The parameters are defined as follows: δ1 C5′(1)–C4′(1)–C3′(1)–O3′(1), ϵ1 C4′(1)–C3′(1)–O3′(1)–P(2), ζ1 C3′(1)–O3′(1)–P(2)–O5′(2), α2 O3′(1)–P(2)–O5′(2)–C5′(2), β2 P(2)–O5′(2)–C5′(2)–C4′(2), ɣ2 O5′(2)–C5′(2)–C4′(2)–C3′(2), δ2 C5′(2)–C4′(2)–C3′(2)–O3′(2), χ1 O4′(1)–C1′(1)–N1/9(1)–C2/4(1), χ2 O4′(2)–C1′(2) N1/9(2)–C2/4(2), the parameters NN as N1/9(1)–N1/9(2), C′C′ as C1′(1)–C1′(2) distances. Finally, the pseudo-torsion μ is defined as the torsion between atoms defining the glycosidic bonds of the first and second nucleotide N1/N9(1)–C1′(1)–C1′(2)–N1/N9(2).
Figure 3.
Figure 3.
Examples of simple motifs built by open conformers. (A) Example of OP12 motif (red) assigned to step G110–C111 from chain B of 2pn4 (73). The step is capable of binding three sequentially distant parts of the molecule or three different chains, one in the center, one in light blue, one in light yellow. (B) OP15 with both bases nearly in one plane often pairs with OP08 in the opposite strand (OP15 in red, OP08 in blue, motif from sarcin/ricin domain of 28S rRNA, step G10–U11 from chain A of 1q96 (50). Drawn by ChimeraX (74).
Figure 4.
Figure 4.
Contoured scattergrams between real-space correlation coefficient (RSCC) and two geometric measures of the fit between the dinucleotide geometry and the geometry of the closest dinucleotide in the golden set. Data were calculated for 2.6 million dinucleotides in all nucleic acid structures in the PDB with available electron density maps as of 16 December 2019. The analogical scattergrams for all NtC classes are posted at the website dnatco.org/contours. The values of the rmsd values delimiting the quadrants are somewhat arbitrary but derived from the values of the assigned dinucleotides.
Figure 5.
Figure 5.
Standardized Pearson residuals (SPR) of populated CANA codes for DNA and RNA analyzed structures calculated for the sixteen dinucleotide sequences. Red (blue) color highlights overpopulated (underpopulated) instances. SPR values highlighted in yellow point to the sequence/CANA combinations where χ2 values are highly significant (for the 15 degrees of freedom and the significance level of 0.01 χ2 > 30). SPR and χ2 values for all CANA codes are listed in supplemental Table S3E.
Figure 6.
Figure 6.
Examples of tetraloop and tetraloop involving motifs. (A) Tetraloop from 4lvz (75) contains OP03 (step G59-A60), followed by a series of A-like NtC classes. (B) The open conformation OP05 preceding the actual tetraloop G2738–A2739–G2740–A2741 in the step C2737–G2738 of 1vq8 (76) enables a kissing loop motif to a distant part of the molecule. (C) Two OPN (OP09 and OP20), are adjacent to a ZZ1S step in nucleotides U2144–C2145–C2146–G2147 of (77). (D) The loop from 4qvi (77) built by OP04 (step G2168–A2169) and IC01 (step A2170–A2171) pairs with distant base (A2119), a part of OP13. Drawn by ChimeraX (74).
Figure 7.
Figure 7.
Examples of riboswitch binding sites. (A) Guanidine II riboswitch bound to guanidine (GAI, 5ndh (63)). Step G6–A7, which facilitates GAI (green) binding, was assigned to OP05. (B) S-Adenosyl homocysteine (SAH) riboswitch (3npq, (64)) binding SAH. Step G15–C16 in the close proximity of an adenosyl group in the ligand SAH (green) is unassigned but very close to OP26 and OP15. Step C28–A29 was assigned to IC02 (blue) and step G31–C32 was assigned to OP12. The structure of these NtCs allows binding of a large ligand, in this example SAH, via intercalation and stacking. Drawn by ChimeraX (74).

Similar articles

Cited by

References

    1. Ramachandran G.N., Sasisekharan V.. Conformation of polypeptides and proteins. Adv. Protein Chem. 1968; 23:283–437. - PubMed
    1. Unger R., Harel D., Wherland S., Sussman J.L.. A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins. 1989; 5:355–373. - PubMed
    1. Levitt M. Accurate modeling of protein conformation by automatic segment matching. J. Mol. Biol. 1992; 226:507–533. - PubMed
    1. Kabsch W., Sander C.. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983; 22:2577–2637. - PubMed
    1. Konagurthu A.S., Lesk A.M., Allison L.. Minimum message length inference of secondary structure from protein coordinate data. Bioinformatics. 2012; 28:i97–i105. - PMC - PubMed

Publication types