. 2020 Jun 19;48(11):6367-6381.

doi: 10.1093/nar/gkaa383.

A unified dinucleotide alphabet describing both RNA and DNA structures

Jiří Černý¹, Paulína Božíková¹, Jakub Svoboda¹, Bohdan Schneider¹

Affiliations

PMID: 32406923
PMCID: PMC7293047
DOI: 10.1093/nar/gkaa383

A unified dinucleotide alphabet describing both RNA and DNA structures

Jiří Černý et al. Nucleic Acids Res. 2020.

. 2020 Jun 19;48(11):6367-6381.

doi: 10.1093/nar/gkaa383.

Authors

Jiří Černý¹, Paulína Božíková¹, Jakub Svoboda¹, Bohdan Schneider¹

Affiliation

¹ Institute of Biotechnology of the Czech Academy of Sciences, BIOCEV, CZ-252 50 Vestec, Prague-West, Czech Republic.

PMID: 32406923
PMCID: PMC7293047
DOI: 10.1093/nar/gkaa383

Abstract

By analyzing almost 120 000 dinucleotides in over 2000 nonredundant nucleic acid crystal structures, we define 96+1 diNucleotide Conformers, NtCs, which describe the geometry of RNA and DNA dinucleotides. NtC classes are grouped into 15 codes of the structural alphabet CANA (Conformational Alphabet of Nucleic Acids) to simplify symbolic annotation of the prominent structural features of NAs and their intuitive graphical display. The search for nontrivial patterns of NtCs resulted in the identification of several types of RNA loops, some of them observed for the first time. Over 30% of the nearly six million dinucleotides in the PDB cannot be assigned to any NtC class but we demonstrate that up to a half of them can be re-refined with the help of proper refinement targets. A statistical analysis of the preferences of NtCs and CANA codes for the 16 dinucleotide sequences showed that neither the NtC class AA00, which forms the scaffold of RNA structures, nor BB00, the DNA most populated class, are sequence neutral but their distributions are significantly biased. The reported automated assignment of the NtC classes and CANA codes available at dnatco.org provides a powerful tool for unbiased analysis of nucleic acid structures by structural and molecular biologists.

PubMed Disclaimer

Figures

**Figure 1.**
Examples of two-dimensional scattergrams of three backbone torsion angles in RNA and DNA molecules. Shown are the values from crystal structures with resolution better than 1.8 Å. The scattergram on the left plots distributions of the torsions at the backbone bonds P-O5′ (axis α2) and C5′–C4′ (axis ɣ2), the scattergram on the right distributions of the torsions at the bonds P-O5′ (axis α2) and O3′-P (axis ζ1).

**Figure 2.**
The analyzed fragment is defined by twelve geometric parameters: seven backbone torsion angles δ1 to δ2, which are highlighted in cyan, plus two torsions around the glycosidic bonds χ1and χ2 (highlighted in green), plus three parameters highlighted in light blue, one pseudo-torsion angle μ, and two distances NN and C’C’. The parameters are defined as follows: δ1 C5′(1)–C4′(1)–C3′(1)–O3′(1), ϵ1 C4′(1)–C3′(1)–O3′(1)–P(2), ζ1 C3′(1)–O3′(1)–P(2)–O5′(2), α2 O3′(1)–P(2)–O5′(2)–C5′(2), β2 P(2)–O5′(2)–C5′(2)–C4′(2), ɣ2 O5′(2)–C5′(2)–C4′(2)–C3′(2), δ2 C5′(2)–C4′(2)–C3′(2)–O3′(2), χ1 O4′(1)–C1′(1)–N1/9(1)–C2/4(1), χ2 O4′(2)–C1′(2) N1/9(2)–C2/4(2), the parameters NN as N1/9(1)–N1/9(2), C′C′ as C1′(1)–C1′(2) distances. Finally, the pseudo-torsion μ is defined as the torsion between atoms defining the glycosidic bonds of the first and second nucleotide N1/N9(1)–C1′(1)–C1′(2)–N1/N9(2).

**Figure 3.**
Examples of simple motifs built by open conformers. (A) Example of OP12 motif (red) assigned to step G110–C111 from chain B of 2pn4 (73). The step is capable of binding three sequentially distant parts of the molecule or three different chains, one in the center, one in light blue, one in light yellow. (B) OP15 with both bases nearly in one plane often pairs with OP08 in the opposite strand (OP15 in red, OP08 in blue, motif from sarcin/ricin domain of 28S rRNA, step G10–U11 from chain A of 1q96 (50). Drawn by ChimeraX (74).

**Figure 4.**
Contoured scattergrams between real-space correlation coefficient (RSCC) and two geometric measures of the fit between the dinucleotide geometry and the geometry of the closest dinucleotide in the golden set. Data were calculated for 2.6 million dinucleotides in all nucleic acid structures in the PDB with available electron density maps as of 16 December 2019. The analogical scattergrams for all NtC classes are posted at the website dnatco.org/contours. The values of the rmsd values delimiting the quadrants are somewhat arbitrary but derived from the values of the assigned dinucleotides.

**Figure 5.**
Standardized Pearson residuals (SPR) of populated CANA codes for DNA and RNA analyzed structures calculated for the sixteen dinucleotide sequences. Red (blue) color highlights overpopulated (underpopulated) instances. SPR values highlighted in yellow point to the sequence/CANA combinations where χ² values are highly significant (for the 15 degrees of freedom and the significance level of 0.01 χ² > 30). SPR and χ² values for all CANA codes are listed in supplemental Table S3E.

**Figure 6.**
Examples of tetraloop and tetraloop involving motifs. (A) Tetraloop from 4lvz (75) contains OP03 (step G59-A60), followed by a series of A-like NtC classes. (B) The open conformation OP05 preceding the actual tetraloop G2738–A2739–G2740–A2741 in the step C2737–G2738 of 1vq8 (76) enables a kissing loop motif to a distant part of the molecule. (C) Two OPN (OP09 and OP20), are adjacent to a ZZ1S step in nucleotides U2144–C2145–C2146–G2147 of (77). (D) The loop from 4qvi (77) built by OP04 (step G2168–A2169) and IC01 (step A2170–A2171) pairs with distant base (A2119), a part of OP13. Drawn by ChimeraX (74).

**Figure 7.**
Examples of riboswitch binding sites. (A) Guanidine II riboswitch bound to guanidine (GAI, 5ndh (63)). Step G6–A7, which facilitates GAI (green) binding, was assigned to OP05. (B) S-Adenosyl homocysteine (SAH) riboswitch (3npq, (64)) binding SAH. Step G15–C16 in the close proximity of an adenosyl group in the ligand SAH (green) is unassigned but very close to OP26 and OP15. Step C28–A29 was assigned to IC02 (blue) and step G31–C32 was assigned to OP12. The structure of these NtCs allows binding of a large ligand, in this example SAH, via intercalation and stacking. Drawn by ChimeraX (74).

See this image and copyright information in PMC

Cited by

PTFSpot: deep co-learning on transcription factors and their binding regions attains impeccable universality in plants.
Gupta S, Kesarwani V, Bhati U, Jyoti, Shankar R. Gupta S, et al. Brief Bioinform. 2024 May 23;25(4):bbae324. doi: 10.1093/bib/bbae324. Brief Bioinform. 2024. PMID: 39013383 Free PMC article.
Has AlphaFold3 achieved success for RNA?
Bernard C, Postic G, Ghannay S, Tahi F. Bernard C, et al. Acta Crystallogr D Struct Biol. 2025 Feb 1;81(Pt 2):49-62. doi: 10.1107/S2059798325000592. Epub 2025 Jan 27. Acta Crystallogr D Struct Biol. 2025. PMID: 39868559 Free PMC article.
Revisiting DNA Sequence-Dependent Deformability in High-Resolution Structures: Effects of Flanking Base Pairs on Dinucleotide Morphology and Global Chain Configuration.
Young RT, Czapla L, Wefers ZO, Cohen BM, Olson WK. Young RT, et al. Life (Basel). 2022 May 20;12(5):759. doi: 10.3390/life12050759. Life (Basel). 2022. PMID: 35629425 Free PMC article.
RBPSpot: Learning on appropriate contextual information for RBP binding sites discovery.
Sharma NK, Gupta S, Kumar A, Kumar P, Pradhan UK, Shankar R. Sharma NK, et al. iScience. 2021 Oct 30;24(12):103381. doi: 10.1016/j.isci.2021.103381. eCollection 2021 Dec 17. iScience. 2021. PMID: 34841226 Free PMC article.
Structural variability of CG-rich DNA 18-mers accommodating double T-T mismatches.
Kolenko P, Svoboda J, Černý J, Charnavets T, Schneider B. Kolenko P, et al. Acta Crystallogr D Struct Biol. 2020 Dec 1;76(Pt 12):1233-1243. doi: 10.1107/S2059798320014151. Epub 2020 Nov 24. Acta Crystallogr D Struct Biol. 2020. PMID: 33263329 Free PMC article.

See all "Cited by" articles

References

1. Ramachandran G.N., Sasisekharan V.. Conformation of polypeptides and proteins. Adv. Protein Chem. 1968; 23:283–437. - PubMed
1. Unger R., Harel D., Wherland S., Sussman J.L.. A 3D building blocks approach to analyzing and predicting structure of proteins. Proteins. 1989; 5:355–373. - PubMed
1. Levitt M. Accurate modeling of protein conformation by automatic segment matching. J. Mol. Biol. 1992; 226:507–533. - PubMed
1. Kabsch W., Sander C.. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983; 22:2577–2637. - PubMed
1. Konagurthu A.S., Lesk A.M., Allison L.. Minimum message length inference of secondary structure from protein coordinate data. Bioinformatics. 2012; 28:i97–i105. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A unified dinucleotide alphabet describing both RNA and DNA structures

Affiliation

A unified dinucleotide alphabet describing both RNA and DNA structures

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources