Review

. 2020 Jan 10;48(1):1-15.

doi: 10.1093/nar/gkz1097.

A guide to computational methods for G-quadruplex prediction

Emilia Puig Lombardi¹, Arturo Londoño-Vallejo¹

Affiliations

PMID: 31754698
PMCID: PMC6943126
DOI: 10.1093/nar/gkz1097

Review

A guide to computational methods for G-quadruplex prediction

Emilia Puig Lombardi et al. Nucleic Acids Res. 2020.

. 2020 Jan 10;48(1):1-15.

doi: 10.1093/nar/gkz1097.

Authors

Emilia Puig Lombardi¹, Arturo Londoño-Vallejo¹

Affiliation

¹ Telomeres and Cancer Laboratory, Institut Curie, PSL Research University, Sorbonne Universités, CNRS UMR3244, 75005 Paris, France.

PMID: 31754698
PMCID: PMC6943126
DOI: 10.1093/nar/gkz1097

Erratum in

A guide to computational methods for G-quadruplex prediction.
Lombardi EP, Londoño-Vallejo A. Lombardi EP, et al. Nucleic Acids Res. 2020 Feb 20;48(3):1603. doi: 10.1093/nar/gkaa033. Nucleic Acids Res. 2020. PMID: 31943112 Free PMC article. No abstract available.

Abstract

Guanine-rich nucleic acids can fold into the non-B DNA or RNA structures called G-quadruplexes (G4). Recent methodological developments have allowed the characterization of specific G-quadruplex structures in vitro as well as in vivo, and at a much higher throughput, in silico, which has greatly expanded our understanding of G4-associated functions. Typically, the consensus motif G3+N1-7G3+N1-7G3+N1-7G3+ has been used to identify potential G-quadruplexes from primary sequence. Since, various algorithms have been developed to predict the potential formation of quadruplexes directly from DNA or RNA sequences and the number of studies reporting genome-wide G4 exploration across species has rapidly increased. More recently, new methodologies have also appeared, proposing other estimates which consider non-canonical sequences and/or structure propensity and stability. The present review aims at providing an updated overview of the current open-source G-quadruplex prediction algorithms and straightforward examples of their implementation.

PubMed Disclaimer

Figures

**Figure 1.**
From guanines to G-quadruplexes. (A) From left to right, guanine residue; four guanines form a planar tetrad stabilised by a central monovalent metal ion (M+), or G-quartet (R, sugar-phosphate backbone of nucleic acids); the stacking of multiple G-quartets forms a G-quadruplex secondary structure. Cartoon representation of the Oxytricha telomeric DNA G4 crystal structure (PDB accession 1JPQ (112)). Structure visualisation was performed with the PyMOL v2.3.1 software (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC), using default colours. (B) Diversity of the G-quadruplex structure. From left to right, three conformations of unimolecular G4s with different backbone arrangements (parallel, anti-parallel and mixed); interstrand bimolecular quadruplex and; interstrand tetramolecular quadruplex. Shades of blue represent different strands.

**Figure 2.**
Performance comparison of different G-quadruplex prediction tools on a reference dataset. The reference dataset used for evaluation is composed of 392 *in vitro* experimentally verified G4 sequences, consisting of 298 positive and 94 negative samples. The sequence logo represents the most common motif found within this set. (A) Tool scores for the reference dataset. Points represent the score values for individual sequences belonging to either G4 or **noG4** classes. WRS: Wilcoxon rank sum test. (B) ROC curves for different tool scores on the reference dataset. A random performing estimator would follow the dotted diagonal line. AUC values for each tool are shown below.

**Figure 3.**
Dealing with overlapping quadruplex motifs in a GC-rich gene promoter. (A) From the upper to the lower track: BCL2 gene annotation (chr18:63 123 346–63 320 128 in the *hg38* reference genome); distribution of all G4 motif prediction scores obtained with pqsfinder and; distribution of pqsfinder G4 motif prediction densities. Higher density values indicate low-complexity regions. (B) G4 sequence prediction in the 2 kb, high-density, BCL2 promoter region. The table shows the results obtained with three different prediction algorithms, Quadparser, G4Hunter (Python) and pqsfinder (R package).

**Figure 4.**
Genomic distribution of G-quadruplex sequences found using different prediction methods. G4 sequences predicted by three different approaches (G4L1–12: regular expression matching G_3–5N₁₋₁₂G_3–5N₁₋₁₂G_3–5N₁₋₁₂ G_3–5, G4Hunter: sliding window and scoring, and G4-seq: high-throughput *in vitro* detection) were annotated. Genomic features were obtained from the respective annotation files in the three species shown. The three-way overlaps between the different datasets are represented as weighted Venn diagrams (with area-proportional circles or faces for clarity).

**Figure 5.**
Annotation of G-quadruplex sequences found exclusively by G4-seq. The annotations of G4 sequences found exclusively (i.e. no overlaps between the sets) *in vitro* using the G4-seq method (green) were compared to those of the motifs predicted by both the G4Hunter algorithm and G4-seq (purple). Genomic features were obtained from the respective annotation files in the four species shown and are reported on the x-axes. Log₂(enrichment) for each of the assessed features is reported on the y-axes. Permutation tests (n = 100 permutations) were performed to assess the significance of the associations; **P-value < 0.01 and |local z-score| > 10; *P-value < 0.05 and |local z-score| > 10.

See this image and copyright information in PMC

References

1. Gellert M., Lipsett M.N., Davies D.R.. Helix formation by guanylic acid. Proc. Natl. Acad. Sci. U.S.A. 1962; 48:2014–2018. - PMC - PubMed
1. Sen D., Gilbert W.. Formation of parallel four-stranded complexes by guanine rich motifs in DNA and its implications for meiosis. Nature. 1988; 334:364–366. - PubMed
1. Sen D., Gilbert W.. A sodium-potassium switch in the formation of four-stranded G4-DNA. Nature. 1990; 334:410–414. - PubMed
1. Simonsson T. G-quadruplex DNA structures–variations on a theme. Biol. Chem. 2001; 382:621–628. - PubMed
1. Lee J.Y., Okumus B., Kim D.S., Ha T.. Extreme conformational diversity in human telomeric DNA. Proc. Natl. Acad. Sci. U.S.A. 2005; 102:18938–18943. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.
- The Lens - Patent Citations Database
Molecular Biology Databases
- FlyBase

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A guide to computational methods for G-quadruplex prediction

Affiliation

A guide to computational methods for G-quadruplex prediction

Authors

Affiliation

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases