Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 27;11(1):2523.
doi: 10.1038/s41467-019-10717-9.

Characterising the loss-of-function impact of 5' untranslated region variants in 15,708 individuals

Collaborators, Affiliations

Characterising the loss-of-function impact of 5' untranslated region variants in 15,708 individuals

Nicola Whiffin et al. Nat Commun. .

Erratum in

Abstract

Upstream open reading frames (uORFs) are tissue-specific cis-regulators of protein translation. Isolated reports have shown that variants that create or disrupt uORFs can cause disease. Here, in a systematic genome-wide study using 15,708 whole genome sequences, we show that variants that create new upstream start codons, and variants disrupting stop sites of existing uORFs, are under strong negative selection. This selection signal is significantly stronger for variants arising upstream of genes intolerant to loss-of-function variants. Furthermore, variants creating uORFs that overlap the coding sequence show signals of selection equivalent to coding missense variants. Finally, we identify specific genes where modification of uORFs likely represents an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in neurofibromatosis. Our results highlight uORF-perturbing variants as an under-recognised functional class that contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data in studying non-coding variant classes.

PubMed Disclaimer

Conflict of interest statement

D.G.M. is a founder with equity in Goldfinch Bio. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1
uAUG-creating variants have strong signals of negative selection, suggesting they are deleterious. a Schematic of uAUG-creating variants, their possible effects and how the strength of the surrounding Kozak consensus is determined. b The number of possible uAUG-creating SNVs in each of 18,593 genes, truncated at 200 (159 genes have >200). In total we identified 562,196 possible uAUG-creating SNVs, an average of 30.2 per gene (dotted line), with 883 genes having none. cf MAPS scores (a measure of negative selection) for different variant sets. The number of observed variants for each set is shown in brackets. MAPS for classes of protein-coding SNVs are shown as dotted lines for comparison (synonymous–grey, missense–orange, and predicted loss-of-function (pLoF)–red point and red dotted line). Errors bars were calculated using bootstrapping (see methods). c While overall UTR variants display a selection signature similar to synonymous variants, uAUG-creating variants have significantly higher MAPS (indicative of being more deleterious; permuted P < 1 × 10−4). Variants are further subdivided into those upstream of, or within genes tolerant (green dot) and intolerant (blue dot) to LoF, with uAUG-creating variants upstream of LoF intolerant genes showing significantly stronger signals of selection than those upstream of LoF tolerant genes (permuted P = 1 × 10−4). pLoF variants are likewise stratified for comparison. d uAUG-creating variants that create an oORF or elongate the CDS show a significantly higher signal of selection than uORF-creating variants (P < 1 × 10−4; oORF created:out-of-frame oORF and CDS elongated combined). e The deleteriousness of uAUG-creating variants depends on the context into which they are created, with stronger selection against uAUG-creation close to the CDS, and with a stronger Kozak consensus sequence. f uAUG-creating variants are under strong negative selection upstream of genes manually curated as haploinsufficient and developmental disorder genes reported to act via a dominant LoF mechanism. Abbreviations: CDS coding sequence, uAUG upstream AUG, uORF upstream open reading frame, oORF overlapping open reading frame, MAPS mutability adjusted proportion of singletons, pLoF predicted loss-of-function, DDG2P Developmental Disease Gene to Phenotype
Fig. 2
Fig. 2
uORF stop codons are highly conserved and stop-removing variants show strong signals of negative selection. a Schematic of uORF stop-removing variants, their possible effects, and how the strength of the surrounding Kozak consensus is determined. be MAPS scores (a measure of negative selection) for different variant sets. The number of observed variants for each set is shown in brackets. MAPS for classes of protein-coding SNVs are shown as dotted lines for comparison (synonymous–black, missense–orange and predicted loss-of-function (pLoF)–red point and red dotted line). Confidence intervals were calculated using bootstrapping (see methods). b Stop-removing SNVs have a nominally higher MAPS score than all UTR SNVs (permuted P = 0.030). Variants are further subdivided into those upstream of, or within genes tolerant (green dot) and intolerant (blue dot) to LoF, with pLoF variants likewise stratified for comparison. Stop-removing SNVs (c) with evidence of translation (in sorfs.org) and (d) that create an oORF have signals of selection equivalent to missense variants. e A significantly higher MAPS is calculated for stop-removing variants where the uORF start site has a strong/moderate Kozak consensus, compared to those with a weak Kozak (permuted P = 7 × 10−4). fj Since MAPS is only calculated on observed variants, we also looked at the conservation of all possible uORF stop site bases, reporting the proportion of bases with phyloP scores >2. All coding bases are shown as a purple dotted line for comparison. f The stop sites of predicted uORFs are significantly more conserved than all UTR bases matched on gene and distance from the CDS (Fisher’s P = 1.8 × 10−17). uORF stop bases are most highly conserved when (g) the uORF has evidence of translation, (h) the variant results in an oORF, (i) the uORF start site has a strong/moderate Kozak consensus, and (j) upstream of curated haploinsufficient genes and developmental genes with a known dominant LoF disease mechanism. Error bars represent 95% binomial confidence intervals. CDS coding sequence, uORF upstream open reading frame, oORF overlapping open reading frame, MAPS mutability adjusted proportion of singletons, DDG2P Developmental Disease Gene to Phenotype
Fig. 3
Fig. 3
The role of uAUG-creating and uORF stop-removing variants in disease. a The proportion of 39 uAUG variants observed in HGMD and ClinVar (red bars) that fit into different sub-categories compared to all possible uAUG-creating SNVs (grey bars) in the same genes (n = 1022). Compared to all possible uAUG-creating variants, uAUG-creating variants observed in HGMD/ClinVar were significantly more likely to be created into a moderate or strong Kozak consensus (binomial P = 3.5 × 10−4), create an out-of-frame oORF (binomial P = 1.1 × 10−5), and be within 50 bp of the CDS (binomial P = 3.9 × 10−7). b Schematic of the NF1 5’UTR (light grey) showing the location of an existing uORF (orange) and the location of variants previously identified in patients with neurofibromatosis in dark red (uAUG-creating) and black (stop-removing). uAUG-creating variants are annotated with the strength of the surrounding Kozak consensus in brackets (“s” for strong and “m” for moderate). All four published variants result in formation of an oORF out-of-frame with the CDS. Also annotated are the positions of all other possible uAUG-creating variants (light red; strong and moderate Kozak only), and stop-removing variants (grey) that would also create an out-of-frame oORF. c Schematic of the NF2 5’UTR (grey) showing the effects of the −65-66insT variant. The reference 5′UTR contains a uORF with a strong Kozak start site. Although the single-base insertion creates a novel uAUG which could be a new uORF start site, it also changes the frame of the existing uORF, so that it overlaps the CDS out-of-frame (forms an oORF). We predict this is the most likely mechanism of pathogenicity. CDS coding sequence, uORF upstream open reading frame, oORF overlapping open reading frame, HGMD the human gene mutation database
Fig. 4
Fig. 4
Identifying genes where uORF creating or disrupting variants are likely to have a role in disease. Genes were split into three distinct categories representing a ‘low’, ‘moderate’ and ‘high’ likelihood that uORF-perturbing variants are important. Low likelihood genes include those with existing oORFs, common (>0.1%) oORF creating variants in gnomAD or that are tolerant to LoF. Those in the high likelihood category are remaining genes that are LoF-intolerant or where haploinsufficient or LoF is a known disease mechanism (see methods). a The number of genes in each of the three categories. b The number of uAUG-creating and uORF stop-removing variants in HGMD upstream of genes in each category. Although only 19.2% of all classified genes fall into the high likelihood category (21.4% of all UTR bases when adjusting for UTR length), 83.7% of uORF-perturbing variants identified in HGMD and ClinVar are found upstream of these genes (Fisher’s P = 1.4 × 10−19)

Comment in

References

    1. Calvo SE, Pagliarini DJ, Mootha VK. Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc. Natl Acad. Sci. U. S. A. 2009;106:7507–7512. doi: 10.1073/pnas.0810916106. - DOI - PMC - PubMed
    1. Johnstone TG, Bazzini AA, Giraldez AJ. Upstream ORFs are prevalent translational repressors in vertebrates. EMBO J. 2016;35:706–723. doi: 10.15252/embj.201592759. - DOI - PMC - PubMed
    1. Iacono M, Mignone F, Pesole G. uAUG and uORFs in human and rodent 5’untranslated mRNAs. Gene. 2005;349:97–105. doi: 10.1016/j.gene.2004.11.041. - DOI - PubMed
    1. Kozak M. Pushing the limits of the scanning mechanism for initiation of translation. Gene. 2002;299:1–34. doi: 10.1016/S0378-1119(02)01056-9. - DOI - PMC - PubMed
    1. Kozak M. An analysis of 5’-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res. 1987;15:8125–8148. doi: 10.1093/nar/15.20.8125. - DOI - PMC - PubMed

Publication types