Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 25;118(21):e2013230118.
doi: 10.1073/pnas.2013230118.

G-quadruplex structural variations in human genome associated with single-nucleotide variations and their impact on gene activity

Affiliations

G-quadruplex structural variations in human genome associated with single-nucleotide variations and their impact on gene activity

Jia-Yuan Gong et al. Proc Natl Acad Sci U S A. .

Abstract

G-quadruplexes (G4s) formed by guanine-rich nucleic acids play a role in essential biological processes such as transcription and replication. Besides the >1.5 million putative G-4-forming sequences (PQSs), the human genome features >640 million single-nucleotide variations (SNVs), the most common type of genetic variation among people or populations. An SNV may alter a G4 structure when it falls within a PQS motif. To date, genome-wide PQS-SNV interactions and their impact have not been investigated. Herein, we present a study on the PQS-SNV interactions and the impact they can bring to G4 structures and, subsequently, gene expressions. Based on build 154 of the Single Nucleotide Polymorphism Database (dbSNP), we identified 5 million gains/losses or structural conversions of G4s that can be caused by the SNVs. Of these G4 variations (G4Vs), 3.4 million are within genes, resulting in an average load of >120 G4Vs per gene, preferentially enriched near the transcription start site. Moreover, >80% of the G4Vs overlap with transcription factor-binding sites and >14% with enhancers, giving an average load of 3 and 7.5 for the two regulatory elements, respectively. Our experiments show that such G4Vs can significantly influence the expression of their host genes. These results reveal genome-wide G4Vs and their impact on gene activity, emphasizing an understanding of genetic variation, from a structural perspective, of their physiological function and pathological implications. The G4Vs may also provide a unique category of drug targets for individualized therapeutics, health risk assessment, and drug development.

Keywords: G-quadruplexes; genetic variations; single nucleotide variations.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
PQS–SNV interactions and examples of structural changes of G4 they may induce. (A) Interactions of an SNV with a PQS (showing only two G-tracts). (B) Representative structural changes (⇔) that may be caused by SNV at the indicated guanine (green to cyan). REF, reference; VAR, variant.
Fig. 2.
Fig. 2.
SNVs and PQSs in the human genome. (A) Accumulation of SNVs in the NCBI dbSNP database over the last decade. (B) The size distribution of PQS motifs. (C) SNV coverage across intervals of chromosomes. Arrow indicates the probability of finding at least one SNV in a 26-nt-sequence motif. (D) Distribution of SNV loads in PQSs.
Fig. 3.
Fig. 3.
The occurrence of SNVs in human genomic regions. (A) Preference of SNV occurrence in PQS region versus in the entire genome. (B) Enrichment of SNVs in PQS regions. (C) The overlap frequency of SNV and PQS over genome intervals around TSS. (D) Total G4Vs caused by SNVs in the human genome. (E) The occurrence of G4V across human Refseq genes. (F) Three-dimensional presentation of G4V occurrence across human Refseq genes sorted in descending order by the mean of G4V frequency.
Fig. 4.
Fig. 4.
Distribution of the number of G4Vs assigned to human genes. (A) Total G4Vs and their subtypes. (B) G4V loads in genes. (C) G4V loads in genes classified by subtypes.
Fig. 5.
Fig. 5.
Interaction of G4V with gene-regulatory elements. (AC) TFBSs. (DF) Enhancers. (A) Percent features within TFBSs. (B) G4V loads in G4V-positive TFBSs. (C) Frequency of G4V–TFBS interactions across Refseq genes. (D) Percent features within enhancers. (E) G4V loads in G4V-positive enhancers. (F) Frequency of G4V–enhancer interactions across Refseq genes.
Fig. 6.
Fig. 6.
The structural change caused by a single point mutation in single-stranded telomeric DNA revealed by CD spectroscopy and gel electrophoresis. (A) DNA sequences used. Letters in red indicate mutation. (B) CD spectroscopy. (C) Gel electrophoresis. G4 migrates faster than an equivalent linear DNA. The DNAs showed identical migration when denatured.
Fig. 7.
Fig. 7.
Examples of G4V caused by SNV: (A) rs10282850, (B) rs536494398, (C) rs1479792287, and (D) rs113205402. G4 formation in single-stranded DNAs was detected by the protection of the G-tracts in DMS footprinting (gels at the left side and digitization at the right side). SNV IDs are at the top of the gels, followed by the names of their host genes in parentheses. The G4 was stabilized by K+, but not by Li+. G-tracts are indicated by brackets and SNVs by arrowheads. Structural change was indicated at the right side of the digitization panel.
Fig. 8.
Fig. 8.
Effect of SNV-mediated loop change and G-tract disruption on IRF8 gene expression. (A) Formation of G4s near the TSS of the IRF8 gene detected by G4-ChIP in living human HEK293T cells. The red arrowhead indicates the PQS bearing the SNVs. (B) Formation of G4s in the REF and VAR duplex DNAs detected by native gel electrophoreses in the PQS indicated by an arrowhead in A. The DNA was heated (H) to generate G4 or not heated (N) to remain fully annealed. (C) Same as B, except that G4 formation was detected by DMS footprinting. (D) Digitization of C. (E) RNA and (F) protein expression of the luciferase reporter downstream of the IRF8 promoter in a pGL3-basic plasmid transfected into HEK293T cells.
Fig. 9.
Fig. 9.
Distribution of G4Vs in 803 oncogenes, 1,217 tumor-suppressor genes, and 104,986 Refseq genes and their enrichment around TSSs (Insets).

References

    1. Hänsel-Hertsch R., Di Antonio M., Balasubramanian S., DNA G-quadruplexes in the human genome: Detection, functions and therapeutic potential. Nat. Rev. Mol. Cell Biol. 18, 279–284 (2017). - PubMed
    1. Rhodes D., Lipps H. J., G-quadruplexes and their regulatory roles in biology. Nucleic Acids Res. 43, 8627–8637 (2015). - PMC - PubMed
    1. Huppert J. L., Balasubramanian S., Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 33, 2908–2916 (2005). - PMC - PubMed
    1. Todd A. K., Johnston M., Neidle S., Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res. 33, 2901–2907 (2005). - PMC - PubMed
    1. Li X. M., et al. ., Guanine-vacancy-bearing G-quadruplexes responsive to guanine derivatives. Proc. Natl. Acad. Sci. U.S.A. 112, 14581–14586 (2015). - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources