Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 14;17(1):40.
doi: 10.1186/s13073-025-01464-2.

Systematic identification of disease-causing promoter and untranslated region variants in 8040 undiagnosed individuals with rare disease

Affiliations

Systematic identification of disease-causing promoter and untranslated region variants in 8040 undiagnosed individuals with rare disease

Alexandra C Martin-Geary et al. Genome Med. .

Abstract

Background: Both promoters and untranslated regions (UTRs) have critical regulatory roles, yet variants in these regions are largely excluded from clinical genetic testing due to difficulty in interpreting pathogenicity. The extent to which these regions may harbour diagnoses for individuals with rare disease is currently unknown.

Methods: We present a framework for the identification and annotation of potentially deleterious proximal promoter and UTR variants in known dominant disease genes. We use this framework to annotate de novo variants (DNVs) in 8040 undiagnosed individuals in the Genomics England 100,000 genomes project, which were subject to strict region-based filtering, clinical review, and validation studies where possible. In addition, we performed region and variant annotation-based burden testing in 7862 unrelated probands against matched unaffected controls.

Results: We prioritised eleven DNVs and identified an additional variant overlapping one of the eleven. Ten of these twelve variants (82%) are in genes that are a strong match to the individual's phenotype and six had not previously been identified. Through burden testing, we did not observe a significant enrichment of potentially deleterious promoter and/or UTR variants in individuals with rare disease collectively across any of our region or variant annotations.

Conclusions: Whilst screening promoters and UTRs can uncover additional diagnoses for individuals with rare disease, including these regions in diagnostic pipelines is not likely to dramatically increase diagnostic yield. Nevertheless, we provide a framework to aid identification of these variants.

Keywords: Non-coding; Promoters; Rare disease; Regulatory regions; Splicing; Untranslated regions.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Ethics approval was granted by the HRA Committee East of England – Cambridge South (REC Ref 14/EE/1112). This research conforms to the principles of the Helsinki Declaration.All participants provided informed consent for their data to be part of the National Genomics Research Library and for use in research and publication. Consent for publication: All participants in this study have provided consent for their data to be part of the National Genomics Research Library and to the publication of research findings. For all cases, written informed consent for research use of clinical and genetic data was obtained from patients, their parents, or legal guardians in the case of those with intellectual disability. Competing interests: The authors declare no competing interests.

Figures

Fig 1
Fig 1
Prioritised de novo variants split by region and variant annotations. DNVs were identified from the Genomics England de novo dataset in the following regions: Promoter (mustard), UTR exons (raspberry), and UTR introns (teal). A Flowchart showing de novo variant counts for all steps in our pipeline and the annotations used to prioritise variants in each region type. Filtering steps are shown in pink boxes. Initial participants without a diagnosis attributed to a coding variant are shown in gold box. De novo variant counts of each stage are shown in pale green boxes. B Upset plot showing genes with variants prioritised by our pipeline. The gene names corresponding to identified DNVs are written above the corresponding bar. Those in black represent likely diagnoses (nine probands), with those in grey not being a good phenotypic match (two probands). Novel potential diagnoses are marked by an asterisk. Vertical bars in the top panel denote the number of variants identified with specific region and variant annotations that are represented by the bar colour (region annotations), and in the upset plot below (variant annotations). The total number of DNVs with each variant annotation is shown by the horizontal bars to the left of the upset.
Fig 2
Fig 2
Candidate diagnostic de novo variants. A Gene diagram showing the creation of an out of frame overlapping ORF (oORF; in red) in the SLC2A1 gene in the proband. B Illustration of the AG exclusion zone in the NIPBL gene. The T>A variant at the -17 position is marked in red, the most strongly predicted branch point (Branchpointer [73] 0.48), directly upstream of the AG exclusion zone is shown in blue. C Multidimensional scaling plot showing differential methylation in SETD5. The position of both variants found in this gene is shown as red dotted lines. D Sashimi plot showing aberrant splicing in the MANE Plus clinical transcript ENST00000371085. The proband shows some retention of the intron containing the variant (which is marked by a red dotted line) and increased skipping of the following exon compared to the controls (6.06% vs 0.65% and 1%)
Fig 3
Fig 3
Burden testing results. Counts of variants and odd ratios (log10) testing for an enrichment of variants in cases compared to matched control participants, collectively by A region annotation and B variant annotation. Annotation groups with fewer than 10 participants are omitted. Error bars represent 95% confidence intervals. Variants in 5′UTRs (one-sided Fishers, P=0.016) and variants with SpliceAI ≥0.5 (one-sided Fisher’s P=0.0039) are enriched in cases over matched controls, but neither remains significant after correcting for multiple testing (Bonferroni threshold adjusting for 16 tests =0.0031). Full results are in Additional file 2:Table S5. Error bars represent 95% confidence intervals from a two-sided Fisher’s test

Update of

References

    1. Blakes AJM, et al. A systematic analysis of splicing variants identifies new diagnoses in the 100,000 Genomes Project. Genome Med. 2022;14:1–11. - PMC - PubMed
    1. Wright CF, et al. Non-coding region variants upstream of MEF2C cause severe developmental disorder through three distinct loss-of-function mechanisms. Am J Hum Genet. 2021;108:1083–94. - PMC - PubMed
    1. Willemsen MA, et al. Upstream SLC2A1 translation initiation causes GLUT1 deficiency syndrome. Eur J Hum Genet. 2017;25:771–4. - PMC - PubMed
    1. Kircher M, et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat Commun. 2019;10:3583. - PMC - PubMed
    1. Griesemer D, et al. Genome-wide functional screen of 3’UTR variants uncovers causal variants for human disease and evolution. Cell. 2021;184:5247–60. - PMC - PubMed

Substances