Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 27;11(1):527.
doi: 10.1038/s41467-020-14404-y.

Integrative analysis reveals RNA G-quadruplexes in UTRs are selectively constrained and enriched for functional associations

Affiliations

Integrative analysis reveals RNA G-quadruplexes in UTRs are selectively constrained and enriched for functional associations

David S M Lee et al. Nat Commun. .

Abstract

G-quadruplex (G4) sequences are abundant in untranslated regions (UTRs) of human messenger RNAs, but their functional importance remains unclear. By integrating multiple sources of genetic and genomic data, we show that putative G-quadruplex forming sequences (pG4) in 5' and 3' UTRs are selectively constrained, and enriched for cis-eQTLs and RNA-binding protein (RBP) interactions. Using over 15,000 whole-genome sequences, we find that negative selection acting on central guanines of UTR pG4s is comparable to that of missense variation in protein-coding sequences. At multiple GWAS-implicated SNPs within pG4 UTR sequences, we find robust allelic imbalance in gene expression across diverse tissue contexts in GTEx, suggesting that variants affecting G-quadruplex formation within UTRs may also contribute to phenotypic variation. Our results establish UTR G4s as important cis-regulatory elements and point to a link between disruption of UTR pG4 and disease.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. UTR pG4 sequences are under heightened selective pressure.
a Schematic depicting a folded RNA parallel G-quadruplex with the accompanying canonical pG4-sequence. b Reduction in variant frequencies affecting guanine G-tracts within UTR pG4 forming sequences compared to matched non-pG4 G-tracts by transcript-level constraint. rG4-G-tracts are those within UTR pG4 that have evidence of secondary structure formation by rG4-seq. Asterisks denote P value << 2.2 × 10−16 by Fisher’s exact test. c Reduction in the number of observed polymorphic sites compared to expectation in 5′ and 3′ UTR pG4 forming G-tracts using a nucleotide substitution model based on local sequence context (permuted P < 1 × 10−4 in all G-tracts compared to matched non-pG4 UTR sequences). Error bars represent bootstrapped 90% confidence interval for the ratio of observed vs. expected substitutions within each pG4 region. Red line and shaded regions represent the observed vs. expected number of substitutions in non-pG4 UTR sequences matched by transcript-level constraint and 90% confidence intervals, respectively. Gray-dashed line represents an expected vs. the observed ratio of 1:1. d Mutability-adjusted proportion of singletons (MAPS) for each set of variants affecting trinucleotide guanines within the meta-pG4 sequence motif. Central position guanines consistently demonstrate the highest MAPS scores (are most constrained) compared to non-pG4 UTR variants (permuted P < 1 × 10−4) across all contexts. Error bars represent the 5% and 95% bootstrap permutations for each variant class. Purple-dashed line, orange dashed line, and gray-dashed line represent MAPS score for Ensembl predicted high-impact coding (predicted loss-of-function), missense, and synonymous mutations respectively. Source data for bd are provided as a Source Data file.
Fig. 2
Fig. 2. Isoforms with G4 in UTR distribution and usage.
a Most genes producing mRNA transcripts with UTR pG4 sequences also produce alternative isoforms that lack UTR pG4s (non-constitutive). b For the subset of genes that produce UTRs with alternative pG4 inclusion, both pG4-containing and non-pG4 isoforms are frequently expressed simultaneously. Median expression (TPM) of each pG4-transcript or non-pG4 transcript was assessed for each tissue context. Transcripts were considered as expressed if their median TPM measurement exceeded one TPM for each tissue context considered. The proportion of pG4 genes expressing both pG4 isoforms, and non-pG4 isoforms was then compared for each tissue. c Overrepresented biological processes for protein-coding genes producing both pG4 and non-pG4 5′ or 3′ UTR isoforms (n = 3148)—see Supplementary Data 1 for the full list. GO-term enrichment was performed using PantherDB and enrichment was determined by meeting a Benjamini–Hochberg adjusted P value cutoff of 0.05 by Fisher’s exact test. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Enrichment for functional associations within UTR pG4 sequences.
a GTEx cis-eQTLs are enriched within UTR pG4 relative to the number of tested (non-eQTL) SNPs when comparing lead SNPs, high-confidence causal, nominally significant, and nominally significant in RBP-binding sites in matched UTR regions (see Supplementary Table 3 for enrichment statistics) Fisher’s exact test. Error bars represent the 95% confidence interval for the odds ratio. b Odds ratio for a cis-eQTL increasing gene expression across all cis-eQTL-tissue effects (n = 379,441, P value < 2e−16, Fisher’s exact test), where the variant affects a pG4 G-tract compared to those affecting gap sequences. Error bars represent the 95% confidence interval for the odds ratio. c Density of RBP-binding sites per kilobase of pG4 sequence compared to non-pG4 regions of the UTR (P value ≪ 2.2 × 10−16, chi-square test). Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Enrichment of specific protein–pG4 binding sites using CLIP-seq data from ENCODE.
a, b Enrichment of specific proteins over pG4-binding sites within the 5′ UTR (left) and 3′ UTR (right)—red line corresponds to P = 0.0001 (hypergeometric test). c, d Heatmaps depicting the significance of overlap (hypergeometric −log P value) in pG4 gene targets for proteins found to bind pG4 sequences preferentially. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. UTR pG4 sequences are enriched for known pathogenic, and putative disease-associated genetic variants.
a Annotated variants within ClinVar disease-associated genes occur with greater frequency in UTR pG4 sequences compared to non-pG4 UTR regions in the 3′ UTR across multiple G4 subsets (error bars represent the 95% confidence interval). b rs108348 maps to a 3′ UTR pG4 G-tract guanine within the primary HSPB7 transcript, which is encoded on the negative DNA strand. The SNP disrupts the canonical G4 sequence motif by causing a G to A mutation in the RNA transcript. c, d WASP-mapping of allele-specific reads in 84 GTEx skeletal muscle samples reveals significant allelic imbalance favoring expression of the alternative allele (P value < 1 x 10−100, likelihood ratio test). Boxplot in c represents median and 1.5 times the interquartile range of WASP-aligned RNA-seq reads aligning to the ference (red) or alternative (blue) allele. Source data are provided as a Source Data file.

References

    1. Hindorff LA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA. 2009;106:9362–9367. doi: 10.1073/pnas.0903103106. - DOI - PMC - PubMed
    1. Chatterjee S, Ahituv N. Gene regulatory elements, major drivers of human disease. Annu. Rev. Genomics Hum. Genet. 2017;18:45–63. doi: 10.1146/annurev-genom-091416-035537. - DOI - PubMed
    1. Visel A, Rubin EM, Pennacchio LA. Genomic views of distant-acting enhancers. Nature. 2009;461:199–205. doi: 10.1038/nature08451. - DOI - PMC - PubMed
    1. Mathelier A, Shi W, Wasserman WW. Identification of altered cis-regulatory elements in human disease. Trends Genet. 2015;31:67–76. doi: 10.1016/j.tig.2014.12.003. - DOI - PubMed
    1. Gruber Andreas J., Zavolan Mihaela. Alternative cleavage and polyadenylation in health and disease. Nature Reviews Genetics. 2019;20(10):599–614. doi: 10.1038/s41576-019-0145-z. - DOI - PubMed