Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2021 Jan 22;4(1):98.
doi: 10.1038/s42003-020-01643-4.

Genome-wide analysis of DNA G-quadruplex motifs across 37 species provides insights into G4 evolution

Affiliations
Comparative Study

Genome-wide analysis of DNA G-quadruplex motifs across 37 species provides insights into G4 evolution

Feng Wu et al. Commun Biol. .

Abstract

G-quadruplex (G4) structures have been predicted in the genomes of many organisms and proven to play regulatory roles in diverse cellular activities. However, there is little information on the evolutionary history and distribution characteristics of G4s. Here, whole-genome characteristics of potential G4s were studied in 37 evolutionarily representative species. During evolution, the number, length, and density of G4s generally increased. Immunofluorescence in seven species confirmed G4s' presence and evolutionary pattern. G4s tended to cluster in chromosomes and were enriched in genetic regions. Short-loop G4s were conserved in most species, while loop-length diversity also existed, especially in mammals. The proportion of G4-bearing genes and orthologue genes, which appeared to be increasingly enriched in transcription factors, gradually increased. The antagonistic relationship between G4s and DNA methylation sites was detected. These findings imply that organisms may have evolutionarily developed G4 into a novel reversible and elaborate transcriptional regulatory mechanism benefiting multiple physiological activities of higher organisms.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Genomic landscape of G4 motifs in the selected species of the representative phylogenetic nodes in the tree of life.
a Thirty-seven species in the phylogenetic tree, indicating the 14 representative evolutionary nodes from primitive eukaryotes to Animalia. bg Genomic landscape of G4 motifs in the 37 species. The G4 density (b, e) and length ratio (c, f) in the genomes, and the ratios of the genes bearing G4 motifs in the upstream 2 kb region (d, g). bd The data are presented based on all types of G4s [(G/C)2L1–4, (G/C)3L1–7, (G/C)3L1–12,), (G/C)4L1–12, and (G/C)5L1–12,]. df The data are presented based on the stable G4 structure (G/C)3L1–7. R2: goodness of fit of the trend lines. *p < 0.05; **p < 0.01; and ***p < 0.001 by F test. n = 37 biologically independent samples. Scer Saccharomyces cerevisiae, Prei Plasmodium reichenowi, Ptet Paramecium tetraurelia, Ddis Dictyostelium discoideum, Sman Schistosoma mansoni, Mlig Macrostomum lignano, Egra Echinococcus granulosus, Nvec Nematostella vectensis, Adig Acropora digitifera, Hvul Hydra vulgaris, Cele Caenorhabditis elegans, Srat Strongyloides ratti, Bmal Brugia malayi, Lgig Lottia gigantea, Obim Octopus bimaculoides, Cgig Crassostrea gigas, Ctel Capitella teleta, Hrob Helobdella robusta, Amel Apis mellifera, Bmor Bombyx mori, Dmel Drosophila melanogaster, Dpul Danaus pulex, Spur Strongylocentrotus purpuratus, Lcha Latimeria chalumnae, Bflo Branchiostoma floridae, Drer Danio rerio, Xtro Xenopus tropicalis, Npar Nanorana parkeri, Acar Anolis carolinensis, Psin Pelodiscus sinensis, Asin Alligator sinensis, Ggal Gallus gallus, Phum Pseudopodoces humilis, Scam Struthio camelus, Oana Octopus anatinus, Oari Ovis aries, Hsap Homo sapiens. The numbers in a and the X-coordinate axes of bg indicate the following phylogenetic categories: (1) fungus (Scer); (2) protozoa (Prei, Pter, Ddis); (3) Platyhelminthes (Sman, Mlig, Egra); (4) Coelenterata (Nvec, Aaus, Hvul); (5) Nematoda (Cele, Srat, Bmal); (6) Mollusca (Lgig, Obim, Cgig); (7) Annelida (Ctel, Hrob); (8) Arthropoda (Amel, Bmor, Dmel, Dpul); (9) Echinodermata (Spur); (10) Fish (Bflo, Drer, Lhal); (11) Amphibian (Xtro, Npar); (12) Reptilia (Acar, Psin, Asin); (13) Aves (Ggal, Phum, Scam); and (14) Mammalia (Hsap, Oana, Oari).
Fig. 2
Fig. 2. GC content and chromosomal distribution of the (G/C)3L1–7 G4 motifs in the selected species.
a The GC content of the genomes of the 37 species. b, c The density of total G4 motifs and (G/C)3L1–7 motifs, normalized over the GC contents. ***p < 0.001 by F test. n = 37 biologically independent samples. d Chromosomic distribution of the (G/C)3L1–7 G4 motifs in nine selected species each from representing phylogenetic categories ranging from the fungus to mammal. The density of the (G/C)3L1–7 G4 motifs was calculated in a 50 kb sliding window with a 50 kb step and plotted in histograms along with each of the chromosomes of each species, indicating that G4 motifs are generally evenly distributed in the genomes of those species with low level of G4s, but clustered in those with high level of G4s (Ggal and Hsap), with high-density windows being separated by low-density windows. Histograms in different colours indicate the G4 motifs in different species. The abbreviation of each species is listed in the legend of Fig. 1.
Fig. 3
Fig. 3. Detection of the G4 structures in the representative species by immunofluorescence staining.
ag Immunofluorescence staining signals showing the G4 structures in Saccharomyces cerevisiae (Scer), Drosophila melanogaster (Dmel), Danio rerio (Drer), Pelodiscus sinensis (Psin), Gallus gallus (Ggal), Ovis aries (Oari), and Homo sapiens (Hsap). h Quantification statistics of the number of G4 structures per nucleus in Scer, Dmel, Drer, Psin, Ggal, Oari, and Hsap cells. Signal spots in 30 nuclei from three replicates (ten nuclei for each replicate) were counted for each species. Data are the mean ± SEM (n = 30), statistical significances were determined by Student’s t test, *p < 0.05, **p < 0.01, and ***p < 0.001. The scale bars equal 5 μm.
Fig. 4
Fig. 4. Functional landscape of the (G/C)3L1–7 G4 motifs during evolution.
a The proportion and enrichment of the genes bearing the (G/C)3L1–7 G4 motif in the upstream 2 kb regions with transcription factor activity. The red histogram indicates the proportion of transcription factors in the genes bearing the (G/C)3L1–7 G4 motifs in their upstream 2 kb region. The blue histogram indicates the proportion of transcription factors in all gene sets. The abbreviation of each species name and the numbers representing phylogenetic nodes are listed in the legend of Fig. 1 Histograms marked with stars indicate the significance of the enriched molecular function of the transcription factor activity by GO enrichment analysis. **p < 0.01 and ***p < 0.001 by hypergeometric test with false discovery rate correction. Detail information of GO enrichment results were shown in Supplementary Data 2 and Supplementary Table 2. b Proportion of genes bearing the (G/C)3L1–7 G4 in the upstream 2 kb regions in the orthologues among different species. Pairwise: pairwise orthologues between Drosophila melanogaster and each of the 12 species listed in the X-coordinate. Common: orthologues common in all 13 species.
Fig. 5
Fig. 5. Loop lengths of the (G/C)3L1–7 G4 motifs in the genomes of the 37 representative species.
a Heat map presentation and hierarchical clustering of the proportion of the (G/C)3L1–7 motifs with each of the loop-length types. b Zoomed out view of one cluster in a, showing the G4 motifs with a relatively higher proportion in the majority of the species. c Species communality of the top ten dominant loop-length types of the (G/C)3L1–7 motifs in each species. The Y-axis shows the number of species that have the corresponding loop-length type of the (G/C)3L1–7 motifs in their top ten motifs. Numbers spaced by colons indicate the base length of the first, second, and third loops, respectively. The abbreviation of species is listed in the legend of Fig. 1.
Fig. 6
Fig. 6. Relationship between the DNA G4 motifs and CpG methylation level in the upstream 2 kb regulatory regions of genes in pig and silkworm.
a The methylation levels of the cytosines at upstream 2 kb region of all genes (All) and those genes bearing the (G/C)3L1–7 motifs in their upstream 2 kb region (G4) in the pig. Bars from the top to the bottom indicate the value of the three quarters, the median and a quarter respectively. ***p < 0.001 by Wilcoxon test. n = 34,333 cytosine sites for the former and 148,769 cytosine sites for the latter. b Plot of the methylation level distribution in the upstream 2 kb regulatory region of genes in the pig. Methylation levels were calculated in 200 bp sliding windows with a 100 bp step. Proportion of unmethylated and methylated cytosines (c) and the methylated cytosines (mCs) with different methylation levels (d) in the upstream 2 kb region of all genes (All) and those genes bearing the (G/C)3L1–7 motifs in their upstream 2 kb region (G4) in the silkworm. High mC hypermethylated cytosines (methylation level > 0.6), mid mC moderate methylated cytosines (0.3 < methylation level < 0.6), low mC hypomethylated cytosines (0 < methylation level < 0.3). **p < 0.01 by chi-square’s test (data subjected to this test were shown in Supplementary Data 1). All: the results of all genes; and G4: the results of the genes bearing the (G/C)3L1–7 motifs in their upstream 2 kb region.

References

    1. Choi J, Majima T. Conformational changes of non-B DNA. Chem. Soc. Rev. 2011;40:5893–5909. doi: 10.1039/c1cs15153c. - DOI - PubMed
    1. Qin Y, Hurley LH. Structures, folding patterns, and functions of intramolecular DNA G-quadruplexes found in eukaryotic promoter regions. Biochimie. 2008;90:1149–1171. doi: 10.1016/j.biochi.2008.02.020. - DOI - PMC - PubMed
    1. Gehring K, Leroy JL, Gueron M. A tetrameric DNA structure with protonated cytosine-cytosine base pairs. Nature. 1993;363:561–565. doi: 10.1038/363561a0. - DOI - PubMed
    1. Henderson E, Hardin CC, Walk SK, Tinoco I, Jr., Blackburn EH. Telomeric DNA oligonucleotides form novel intramolecular structures containing guanine-guanine base pairs. Cell. 1987;51:899–908. doi: 10.1016/0092-8674(87)90577-0. - DOI - PubMed
    1. Zahler AM, Williamson JR, Cech TR, Prescott DM. Inhibition of telomerase by G-quartet DNA structures. Nature. 1991;350:718–720. doi: 10.1038/350718a0. - DOI - PubMed

Publication types

LinkOut - more resources