Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Jan 10;48(1):1-15.
doi: 10.1093/nar/gkz1097.

A guide to computational methods for G-quadruplex prediction

Affiliations
Review

A guide to computational methods for G-quadruplex prediction

Emilia Puig Lombardi et al. Nucleic Acids Res. .

Erratum in

Abstract

Guanine-rich nucleic acids can fold into the non-B DNA or RNA structures called G-quadruplexes (G4). Recent methodological developments have allowed the characterization of specific G-quadruplex structures in vitro as well as in vivo, and at a much higher throughput, in silico, which has greatly expanded our understanding of G4-associated functions. Typically, the consensus motif G3+N1-7G3+N1-7G3+N1-7G3+ has been used to identify potential G-quadruplexes from primary sequence. Since, various algorithms have been developed to predict the potential formation of quadruplexes directly from DNA or RNA sequences and the number of studies reporting genome-wide G4 exploration across species has rapidly increased. More recently, new methodologies have also appeared, proposing other estimates which consider non-canonical sequences and/or structure propensity and stability. The present review aims at providing an updated overview of the current open-source G-quadruplex prediction algorithms and straightforward examples of their implementation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
From guanines to G-quadruplexes. (A) From left to right, guanine residue; four guanines form a planar tetrad stabilised by a central monovalent metal ion (M+), or G-quartet (R, sugar-phosphate backbone of nucleic acids); the stacking of multiple G-quartets forms a G-quadruplex secondary structure. Cartoon representation of the Oxytricha telomeric DNA G4 crystal structure (PDB accession 1JPQ (112)). Structure visualisation was performed with the PyMOL v2.3.1 software (The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC), using default colours. (B) Diversity of the G-quadruplex structure. From left to right, three conformations of unimolecular G4s with different backbone arrangements (parallel, anti-parallel and mixed); interstrand bimolecular quadruplex and; interstrand tetramolecular quadruplex. Shades of blue represent different strands.
Figure 2.
Figure 2.
Performance comparison of different G-quadruplex prediction tools on a reference dataset. The reference dataset used for evaluation is composed of 392 in vitro experimentally verified G4 sequences, consisting of 298 positive and 94 negative samples. The sequence logo represents the most common motif found within this set. (A) Tool scores for the reference dataset. Points represent the score values for individual sequences belonging to either G4 or noG4 classes. WRS: Wilcoxon rank sum test. (B) ROC curves for different tool scores on the reference dataset. A random performing estimator would follow the dotted diagonal line. AUC values for each tool are shown below.
Figure 3.
Figure 3.
Dealing with overlapping quadruplex motifs in a GC-rich gene promoter. (A) From the upper to the lower track: BCL2 gene annotation (chr18:63 123 346–63 320 128 in the hg38 reference genome); distribution of all G4 motif prediction scores obtained with pqsfinder and; distribution of pqsfinder G4 motif prediction densities. Higher density values indicate low-complexity regions. (B) G4 sequence prediction in the 2 kb, high-density, BCL2 promoter region. The table shows the results obtained with three different prediction algorithms, Quadparser, G4Hunter (Python) and pqsfinder (R package).
Figure 4.
Figure 4.
Genomic distribution of G-quadruplex sequences found using different prediction methods. G4 sequences predicted by three different approaches (G4L1–12: regular expression matching G3–5N1−12G3–5N1−12G3–5N1−12 G3–5, G4Hunter: sliding window and scoring, and G4-seq: high-throughput in vitro detection) were annotated. Genomic features were obtained from the respective annotation files in the three species shown. The three-way overlaps between the different datasets are represented as weighted Venn diagrams (with area-proportional circles or faces for clarity).
Figure 5.
Figure 5.
Annotation of G-quadruplex sequences found exclusively by G4-seq. The annotations of G4 sequences found exclusively (i.e. no overlaps between the sets) in vitro using the G4-seq method (green) were compared to those of the motifs predicted by both the G4Hunter algorithm and G4-seq (purple). Genomic features were obtained from the respective annotation files in the four species shown and are reported on the x-axes. Log2(enrichment) for each of the assessed features is reported on the y-axes. Permutation tests (n = 100 permutations) were performed to assess the significance of the associations; **P-value < 0.01 and |local z-score| > 10; *P-value < 0.05 and |local z-score| > 10.

References

    1. Gellert M., Lipsett M.N., Davies D.R.. Helix formation by guanylic acid. Proc. Natl. Acad. Sci. U.S.A. 1962; 48:2014–2018. - PMC - PubMed
    1. Sen D., Gilbert W.. Formation of parallel four-stranded complexes by guanine rich motifs in DNA and its implications for meiosis. Nature. 1988; 334:364–366. - PubMed
    1. Sen D., Gilbert W.. A sodium-potassium switch in the formation of four-stranded G4-DNA. Nature. 1990; 334:410–414. - PubMed
    1. Simonsson T. G-quadruplex DNA structures–variations on a theme. Biol. Chem. 2001; 382:621–628. - PubMed
    1. Lee J.Y., Okumus B., Kim D.S., Ha T.. Extreme conformational diversity in human telomeric DNA. Proc. Natl. Acad. Sci. U.S.A. 2005; 102:18938–18943. - PMC - PubMed

Publication types