Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov;300(11):107822.
doi: 10.1016/j.jbc.2024.107822. Epub 2024 Sep 26.

Enriched G4 forming repeats in the human genome are associated with robust well-coordinated transcription and reduced cancer transcriptome variation

Affiliations

Enriched G4 forming repeats in the human genome are associated with robust well-coordinated transcription and reduced cancer transcriptome variation

Ruth B De-Paula et al. J Biol Chem. 2024 Nov.

Abstract

Non-B DNA G-quadruplex (G4) structures with guanine (G) runs of 2 to 4 repeats can trigger opposing experimental transcriptional impacts. Here, we used bioinformatic algorithms to comprehensively assess correlations of steady-state RNA transcript levels with all putative G4 sequence (pG4) locations genome-wide in three mammalian genomes and in normal and tumor human tissues. The human pG4-containing gene set displays higher expression levels than the set without pG4, supporting and extending some prior observations. pG4 enrichment at transcription start sites (TSSs) in human, but not chimpanzee and mouse genomes, suggests possible positive selection pressure for pG4 at human TSS, potentially driving genome rewiring and gene expression divergence between human and chimpanzee. Comprehensive bioinformatic analyses revealed lower pG4-containing gene set variability in humans and among different pG4 genes in tumors. As G4 stabilizers are under therapeutic consideration for cancer and pathogens, such distinctions between human normal and tumor G4s along with other species merit attention. Furthermore, in germline and cancer sequences, the most mutagenic pG4 mapped to regions promoting alternative DNA structures. Overall findings establish high pG4 at TSS as a human genome attribute statistically associated with robust well-coordinated transcription and reduced cancer transcriptome variation with implications for biology, model organisms, and medicine.

Keywords: G-quadruplex; bioinformatics; cancer; epigenetics; gene expression; inherited disease; non-B DNA.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare that they have no conflicts of interest with the contents of this article.

Figures

Figure 1
Figure 1
Genes with pG4 display robust expression, are depleted in olfactory/chemical stimulus-related genes and are enriched in neuronal- and developmental-related genes.A, boxplots representing the expression levels of genes with and without pG4s at different regulatory locations. Expression levels were converted to log2 (fpkm + 1) and results from different samples were averaged. Boxplots were constructed using the interquartile range (IQR) and median, and whiskers were calculated using IQR ∗ 1.5. All comparisons between genes with and without pG4 were statistically significant according to Wilcoxon tests. ∗∗∗, Wilcoxon test p < 0.001. B, Cartesian plots representing the genome-wide counts of pG4 bp within 2500 bp before and 2500 bp after 5' (left) and 3' (right) splice sites. C, scatter plots displaying GO terms scattered according to log2 of fold enrichment (x axis) and -log10 of p-values (y axis). Circled in green, olfactory/chemical stimulus-related terms; circled in magenta, immune-related terms; circled in blue, neuronal- and developmental-related terms. GO, Gene Ontology.
Figure 2
Figure 2
Genes with pG4 have lower expression difference between tumors and normal tissue.A, box plots representing the expression levels of genes with and without pG4s in normal (left) and tumor (right) tissue. Expression levels were converted to log2(fpkm + 1) and results from different samples were averaged. Boxplots were constructed using the interquartile range (IQR) and median, and whiskers were calculated using IQR ∗ 1.5. All comparisons between genes with and without pG4 were statistically significant according to Wilcoxon tests. ∗∗∗, Wilcoxon test p < 0.001. B, columns graph representing the log2 fold change difference of expression among tumor tissues and normal control tissues in TCGA. Bars on the left part of the graph represent cases in which the average expression of genes was higher for normal than for tumor tissue, and bars on the right part represent the opposite. ∗, t test p < 0.05; ∗∗, t test p < 0.01; ∗∗∗, t test p < 0.001. BRCA, breast carcinoma; BLCA, bladder urothelial carcinoma; COAD, colon adenocarcinoma; ESCA, esophageal carcinoma; HNSC, head and neck squamous cell carcinoma; KIRC: kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; KICH, kidney chromophobe carcinoma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; PRAD, prostate adenocarcinoma; STAD, stomach adenocarcinoma; TCGA, The Cancer Genome Atlas; THCA, thyroid carcinoma; UCEC, uterine corpus endometrial carcinoma.
Figure 3
Figure 3
Expression variability of pG4-containing genes decreases in tumors relative to matched controls.A, box plots for selected cancers representing coefficients of variation of expression across pG4 genes (g-CV). Coefficients were calculated per individual sample, and only samples with both tumor and normal tissue data were analyzed. Box plots were constructed using the interquartile range (IQR) and median, and whiskers were calculated using IQR ∗ 1.5. Paired t tests were used to compare the coefficients between normal and tumor tissue. ∗, t test p < 0.05; ∗∗, t test p < 0.01; ∗∗∗, t test p < 0.001. Matched samples are connected by lines. Red lines: increased variability in tumor compared to normal tissue; blue lines: decreased variability in tumor compared to normal tissue. B, boxplots for the same cancers as in panel A, representing coefficients of variation of expression across genes without pG4. CV, coefficient of variation.
Figure 4
Figure 4
Diseases with cardiac symptoms and cancers of the gastrointestinal and aerodigestive systems are enriched for mutations overlapping pG4 motifs.A, most common inherited diseases associated with pG4 mutations identified by C++Quad search algorithm. For COSMIC search, only the primary tumor site was considered. B, most common and stable pG4s identified by Quadron and their associated mutations. The number of mutations for HGMD is based on unique genomic mutation coordinates, and for COSMIC is based on unique combination of genomic mutation coordinates and sample name. HGMD, Human Gene Mutation Database.
Figure 5
Figure 5
The most mutable pG4 motifs are embedded in an unstable DNA sequence context.A, bar plot of the number of 500 bp bins throughout the entire human genome, sorted by ΔG values. B, predicted hairpin structure of the most prominent pG4 hotspot for HGMD mutations, located at MEN1 gene. Numbers indicate the mutation IDs, and colors indicate types of mutations or sequence features. C, predicted hairpin structure of the most prominent pG4 hotspot for COSMIC mutations, located at NOTCH1 gene. Numbers indicate the mutation IDs, and colors indicate types of mutations or sequence features. D, table representing main mutational hotspots (small deletions/insertions/duplications) in cancer. HGMD, Human Gene Mutation Database.

Similar articles

Cited by

References

    1. Kohwi Y., Kohwi-Shigematsu T. Altered gene expression correlates with DNA structure. Genes Dev. 1991;5:2547–2554. - PubMed
    1. Biffi G., Tannahill D., McCafferty J., Balasubramanian S. Quantitative visualization of DNA G-quadruplex structures in human cells. Nat. Chem. 2013;5:182–186. - PMC - PubMed
    1. Hänsel-Hertsch R., Di Antonio M., Balasubramanian S. DNA G-quadruplexes in the human genome: detection, functions and therapeutic potential. Nat. Rev. Mol. Cell Bio. 2017;18:279–284. - PubMed
    1. Esnault C., Magat T., Zine A., Aabidine E., Garcia-Oliver E., Cucchiarini A., et al. G4access identifies G-quadruplexes and their associations with open chromatin and imprinting control regions. Nat. Genet. 2023;55:1359–1369. - PubMed
    1. Niu K., Xiang L., Li X., Li J., Li Y., Zhang C., et al. DNA 5-methylcytosine regulates genome-wide formation of G-quadruplex structures. bioRxiv. 2023 doi: 10.1101/2023.02.16.528796. [preprint] - DOI

Publication types