This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2024 Aug 13:2024.02.13.580158.

doi: 10.1101/2024.02.13.580158.

Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations

Shilpa Nadimpalli Kobren¹, Mikhail A Moldovan¹, Rebecca Reimers², Daniel Traviglia¹, Xinyun Li³, Danielle Barnum⁴, Alexander Veit¹, Rosario I Corona⁵, George de V Carvalho Neto⁵, Julian Willett⁶, Michele Berselli¹, William Ronchetti¹, Stanley F Nelson⁵, Julian A Martinez-Agosto⁵, Richard Sherwood⁷, Joel Krier⁸, Isaac S Kohane¹; Undiagnosed Diseases Network; Shamil R Sunyaev¹

Affiliations

¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA.
² Scripps Research Translational Institute, La Jolla, CA.
³ Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT.
⁴ Access to Medicine Foundation, Amsterdam, The Netherlands.
⁵ Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA.
⁶ Department of Pathology and Laboratory Medicine, NewYork-Presbyterian Weill Cornell Medical Center, New York, NY.
⁷ Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA.
⁸ Department of Genetics, Atrius Health, Boston, MA.

PMID: 38405764
PMCID: PMC10888768
DOI: 10.1101/2024.02.13.580158

Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations

Shilpa Nadimpalli Kobren et al. bioRxiv. 2024.

[Preprint]. 2024 Aug 13:2024.02.13.580158.

doi: 10.1101/2024.02.13.580158.

Authors

Affiliations

¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA.
² Scripps Research Translational Institute, La Jolla, CA.
³ Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT.
⁴ Access to Medicine Foundation, Amsterdam, The Netherlands.
⁵ Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA.
⁶ Department of Pathology and Laboratory Medicine, NewYork-Presbyterian Weill Cornell Medical Center, New York, NY.
⁷ Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA.
⁸ Department of Genetics, Atrius Health, Boston, MA.

PMID: 38405764
PMCID: PMC10888768
DOI: 10.1101/2024.02.13.580158

Update in

Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations.
Kobren SN, Moldovan MA, Reimers R, Traviglia D, Li X, Barnum D, Veit A, Corona RI, Carvalho Neto GV, Willett J, Berselli M, Ronchetti W, Nelson SF, Martinez-Agosto JA, Sherwood R, Krier J, Kohane IS; Undiagnosed Diseases Network; Sunyaev SR. Kobren SN, et al. Nat Commun. 2025 Aug 7;16(1):7267. doi: 10.1038/s41467-025-61712-2. Nat Commun. 2025. PMID: 40770127 Free PMC article.

Abstract

Genomics for rare disease diagnosis has advanced at a rapid pace due to our ability to perform "N-of-1" analyses on individual patients with ultra-rare diseases. The increasing sizes of ultra-rare disease cohorts internationally newly enables cohort-wide analyses for new discoveries, but well-calibrated statistical genetics approaches for jointly analyzing these patients are still under development.^1,2 The Undiagnosed Diseases Network (UDN) brings multiple clinical, research and experimental centers under the same umbrella across the United States to facilitate and scale N-of-1 analyses. Here, we present the first joint analysis of whole genome sequencing data of UDN patients across the network. We introduce new, well-calibrated statistical methods for prioritizing disease genes with de novo recurrence and compound heterozygosity. We also detect pathways enriched with candidate and known diagnostic genes. Our computational analysis, coupled with a systematic clinical review, recapitulated known diagnoses and revealed new disease associations. We further release a software package, RaMeDiES, enabling automated cross-analysis of deidentified sequenced cohorts for new diagnostic and research discoveries. Gene-level findings and variant-level information across the cohort are available in a public-facing browser (https://dbmi-bgm.github.io/udn-browser/). These results show that N-of-1 efforts should be supplemented by a joint genomic analysis across cohorts.

PubMed Disclaimer

Figures

**Figure 1.. Undiagnosed Diseases Network cohort analysis.**
**(a)** Map of clinical and research sites within the Undiagnosed Diseases Network (UDN) for evaluating patients and candidate variant functionality. **(b)** Genetic ancestry across the sequenced patient cohort. **(c)** Clinician-recorded primary symptom categories of patients. “Multiple” indicates 2+ categories could be considered primary and “other” indicates an unlisted category. Categories marked with an asterisk (*) are neurological subtypes (Supplementary Note S1). **(d)** Patient-reported age of first symptom onset. **(e)** Patient sex. **(f)** Categories and quantity of phenotype information collected for patients and made available to all UDN researchers (icons are from Microsoft PowerPoint). **(g)** Intronic variants detectable from genome sequencing (orange star) with a predicted splice-altering impact are considered alongside exonic variants in our statistical framework; these variants may result in retained introns or excised exons in processed transcripts. **(h)** We consider genes and gene pathways harboring *de novo* and compound heterozygous variants in sequenced trios (72% of cases). Complete case count by family structure (e.g., proband-only, duo) is in Supplementary Figure S2. Other inheritance modes (e.g., homozygous, uniparental disomy) are not considered in our cohort-based framework. **(i)** Depiction of clinical framework to uniformly evaluate how well a patient’s phenotypes are concordant with a candidate gene or variant.

**Figure 2.. *De novo* recurrence.**
**(a)** *De novo* mutation counts per proband adjusted for parental ages. Blue vertical lines show the mean values of the distributions, and curves represent the Poisson fits. **(b)** Schematic of analytical test for the recurrence of *de novos* that considers distal splice-altering and exonic SNV and indel variants, their variant functionality scores, a genome-wide mutation rate model Roulette, and per-gene GeneBayes constraint values. “Like” variants refer to those of the same variant class (i.e., coding SNVs [CS], coding indels [CI], intronic SNVs [IS], intronic indels [II]) and within the same functionality score and minor allele frequency thresholds. **(c)** Genes with highest significance values for *de novo* recurrence across the cohort when focusing on missense variants with AlphaMissense and PrimateAI-3D scores; patients are represented as colored circles. Complete gene list can be found in Supplementary Table S2. **(d)** AlphaFold-predicted human *LRRC7* protein structure (AF-Q96NW7-F1) covering the leucine-rich repeat region with high predicted structural confidence (amino acid positions 86–463). The fifth and eighth LRR domains where missense *de novos* were found are highlighted in blue. Reference alleles for missense *de novo* variants observed across two UDN patients (red) are shown in circles. A depiction of *LRRC7*’s linear protein sequence (Ensembl ID ENSP00000498937) with InterPro predicted domains shown in colored boxes is below. **(e)** Overlap of phenotype terms annotated to each patient.

**Figure 3.. Compound heterozygous variants.**
**(a)** Illustration of the unnormalized squared mutational target computed for each observed comphet variant in a gene across the cohort (RaMeDiES-CH, Supplementary Figure S11) or in an individual across the genome (RaMeDiES-IND, Supplementary Figure S12). “Like” variants refer to those of the same variant class (i.e., coding SNVs [CS], coding indels [CI], intronic SNVs [IS], intronic indels [II]) and within the same functionality score and minor allele frequency thresholds. **(b)** Top ranked genes resulting in the best enrichment statistic computed for RaMeDiES-IND. Putative candidates refer to genes that remain candidates for pathogenicity due to their phenotypically-relevant tissue expression, but where there is not enough functional evidence or published gene–disease relationships to establish causality at this time. **(c)** Overlap between phenotypes associated with *MED11* and those exhibited by the affected patient. **(d)** RNA-Seq reads from whole blood samples aligned to first two exons and first intron of *MED11* for proband (black), dad (blue), mom (purple) and two tissue-matched control samples (gray). Thin green line represents the intron, solid boxes represent protein-coding exonic regions, and the dotted box represents the 5’ untranslated region of *MED11*. **(e)** Proband exhibits significant retention of the first intron relative to parents and fifty-three tissue-matched control samples. Intron retention ratio is calculated as the (median read depth of first intron) / (number of reads spanning first and second exons + median read depth of first intron).

**Figure 4.. Biological pathways enriched within phenotypically-similar patient subgroups.**
**(a)** Schematic illustrating the two-step process of first clustering patients according to the semantic similarity of their phenotype terms and second finding enriched biological pathways among the genes within each patient cluster. **(b)** The most significant pathways per cluster (adjusted p-value < 0.01) with 1+ genes from 1+ undiagnosed patients; complete list in Supplementary Table S6. **(c)** Two patients with primarily immune-related symptoms each harbored a compelling *de novo* variant in genes involved in immunoproteasome assembly (*POMP*) and structure (*PSMB8*). Their symptoms strongly overlap, and a subset of these symptoms were also known to be associated with either gene in OMIM. **(d)** Three neurological patients had variants in transmembrane genes involved in the same pathway. These patients had substantial phenotypic overlap with each other, as expected, and with the phenotypes associated with each of their genes (depicted as star shapes in the upset plot).

See this image and copyright information in PMC

References

1. Kaplanis J. et al. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature 586, 757–762 (2020). - PMC - PubMed
1. 100,000 Genomes Project Pilot Investigators et al. 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. N. Engl. J. Med. 385, 1868–1880 (2021). - PMC - PubMed
1. Marx J. L. The cystic fibrosis gene is found. Science 245, 923–925 (1989). - PubMed
1. Roach J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010). - PMC - PubMed
1. O’Roak B. J. et al. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat. Genet. 43, 585–589 (2011). - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations

Affiliations

Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations

Authors

Affiliations

Update in

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources