Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 7;16(1):7267.
doi: 10.1038/s41467-025-61712-2.

Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations

Collaborators, Affiliations

Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations

Shilpa Nadimpalli Kobren et al. Nat Commun. .

Abstract

Genomics for rare disease diagnosis has advanced at a rapid pace due to our ability to perform in-depth analyses on individual patients with ultra-rare diseases. The increasing sizes of ultra-rare disease cohorts internationally newly enables cohort-wide analyses for new discoveries, but well-calibrated statistical genetics approaches for jointly analyzing these patients are still under development. The Undiagnosed Diseases Network (UDN) brings multiple clinical, research and experimental centers under the same umbrella across the United States to facilitate and scale case-based diagnostic analyses. Here, we present the first joint analysis of whole genome sequencing data of UDN patients across the network. We introduce new, well-calibrated statistical methods for prioritizing disease genes with de novo recurrence and compound heterozygosity. We also detect pathways enriched with candidate and known diagnostic genes. Our computational analysis, coupled with a systematic clinical review, recapitulated known diagnoses and revealed new disease associations. We further release a software package, RaMeDiES, enabling automated cross-analysis of deidentified sequenced cohorts for new diagnostic and research discoveries. Gene-level findings and variant-level information across the cohort are available in a public-facing browser ( https://dbmi-bgm.github.io/udn-browser/ ). These results show that case-level diagnostic efforts should be supplemented by a joint genomic analysis across cohorts.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Undiagnosed diseases network cohort analysis.
a Map of clinical and research sites within the Undiagnosed Diseases Network (UDN) for evaluating patients and candidate variant functionality. Map created using R’s usmap package. b Genetic ancestry across the sequenced patient cohort. c Clinician-recorded primary symptom categories of patients. “Multiple” indicates 2+ categories could be considered primary and “other” indicates an unlisted category. Categories marked with an asterisk (*) are neurological subtypes (Supplementary Note 1). d Patient-reported age of first symptom onset. e Patient sex. f Categories and quantity of phenotype information collected for patients and made available to all UDN researchers. g Intronic variants detectable from genome sequencing (orange star) with a predicted splice-altering impact are considered alongside exonic variants in our statistical framework; these variants may result in retained introns or excised exons in processed transcripts. h We consider genes and gene pathways harboring de novo and compound heterozygous variants in sequenced trios (72% of all accepted cases). Other inheritance modes (e.g., homozygous, uniparental disomy) are not considered in our cohort-based framework. Complete case count by family structure (e.g., proband-only, duo) is in Supplementary Fig. 2. i Depiction of clinical framework to uniformly evaluate how well a patient’s phenotypes are concordant with a candidate gene or variant. All icons in (f) and (i) are from Microsoft PowerPoint.
Fig. 2
Fig. 2. De novo recurrence.
a De novo mutation counts per proband adjusted for parental ages. Blue vertical lines show the mean values of the distributions, and curves represent the Poisson fits. b Schematic of analytical test for the recurrence of de novos that considers distal splice-altering and exonic SNV and indel variants, their variant functionality scores, a genome-wide mutation rate model Roulette, and per-gene GeneBayes constraint values. “Like” variants refer to those of the same variant class (i.e., coding SNVs [CS], coding indels [CI], intronic SNVs [IS], intronic indels [II]) and within the same functionality score and minor allele frequency thresholds. c Genes with highest significance values for de novo recurrence across the cohort, computed as described in b3 (Pr’( yg )) and b4 (Qg), when focusing on missense variants with AlphaMissense and PrimateAI-3D scores; patients are represented as colored circles. Complete gene list with exact P values can be found in Supplementary Data 3. Note that multiple testing corrections have been applied in the form of both Bonferroni (**) and FDR (*) thresholds. d AlphaFold-predicted human LRRC7 protein structure (AF-Q96NW7-F1) covering the leucine-rich repeat region with high predicted structural confidence (amino acid positions 86-463). The fifth and eighth LRR domains where missense de novos were found are highlighted in blue. Reference alleles for missense de novo variants observed across two UDN patients (red) are shown in circles. A depiction of LRRC7’s linear protein sequence (Ensembl ID ENSP00000498937) with InterPro predicted domains shown in colored boxes is below. e Overlap of phenotype terms annotated to each patient.
Fig. 3
Fig. 3. Compound heterozygous variants.
a Illustration of the unnormalized squared mutational target computed for each observed comphet variant in a gene across the cohort (RaMeDiES-CH, Supplementary Fig. 11) or in an individual across the genome (RaMeDiES-IND, Supplementary Fig. 12). “Like” variants refer to those of the same variant class (i.e., coding SNVs [CS], coding indels [CI], intronic SNVs [IS], intronic indels [II]) and within the same functionality score and minor allele frequency thresholds. b Top ranked genes resulting in the best enrichment statistic computed for RaMeDiES-IND. Putative candidates refer to genes that remain candidates for pathogenicity due to their phenotypically-relevant tissue expression, but where there is not enough functional evidence or published gene–disease relationships to establish causality at this time. c Overlap between phenotypes associated with MED11 and those exhibited by the affected patient. d RNA-Seq reads from whole blood samples aligned to first two exons and first intron of MED11 for proband (black), dad (blue), mom (purple) and two tissue-matched control samples (gray). Thin green line represents the intron, solid boxes represent protein-coding exonic regions, and the dotted box represents the 5’ untranslated region of MED11. (e) Proband exhibits significant retention of the first intron relative to parents and fifty-three tissue-matched control samples. Intron retention ratio is calculated as the (median read depth of first intron) / (number of reads spanning first and second exons + median read depth of first intron).
Fig. 4
Fig. 4. Biological pathways enriched within phenotypically-similar patient subgroups.
a Schematic illustrating the two-step process of first clustering patients according to the semantic similarity of their phenotype terms and second finding enriched biological pathways among the genes within each patient cluster. b The most significant pathways per cluster (P value < 0.01, adjusted for multiple testing using g:Profiler’s Statistical Correction Scheme) with 1+ genes from 1+ undiagnosed patients; complete list in Supplementary Data 8. c Two patients with primarily immune-related symptoms each harbored a compelling de novo variant in genes involved in immunoproteasome assembly (POMP) and structure (PSMB8). Their symptoms strongly overlap, and a subset of these symptoms were also known to be associated with either gene in OMIM. d Three neurological patients had variants in transmembrane genes involved in the same pathway. These patients had substantial phenotypic overlap with each other, as expected, and with the phenotypes associated with each of their genes (depicted as star shapes in the upset plot). All icons in (a), (c) and (d) are from Microsoft PowerPoint.

Update of

References

    1. Marx, J. L. The cystic fibrosis gene is found. Science245, 923–925 (1989). - PubMed
    1. Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science328, 636–639 (2010). - PMC - PubMed
    1. O’Roak, B. J. et al. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat. Genet.43, 585–589 (2011). - PMC - PubMed
    1. Jin, S. C. et al. Contribution of rare inherited and de novo variants in 2871 congenital heart disease probands. Nat. Genet.49, 1593–1601 (2017). - PMC - PubMed
    1. Vissers, L. E. L. M. et al. A de novo paradigm for mental retardation. Nat. Genet.42, 1109–1112 (2010). - PubMed

Grants and funding

LinkOut - more resources