Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Aug 13:2024.02.13.580158.
doi: 10.1101/2024.02.13.580158.

Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations

Affiliations

Joint, multifaceted genomic analysis enables diagnosis of diverse, ultra-rare monogenic presentations

Shilpa Nadimpalli Kobren et al. bioRxiv. .

Update in

Abstract

Genomics for rare disease diagnosis has advanced at a rapid pace due to our ability to perform "N-of-1" analyses on individual patients with ultra-rare diseases. The increasing sizes of ultra-rare disease cohorts internationally newly enables cohort-wide analyses for new discoveries, but well-calibrated statistical genetics approaches for jointly analyzing these patients are still under development.1,2 The Undiagnosed Diseases Network (UDN) brings multiple clinical, research and experimental centers under the same umbrella across the United States to facilitate and scale N-of-1 analyses. Here, we present the first joint analysis of whole genome sequencing data of UDN patients across the network. We introduce new, well-calibrated statistical methods for prioritizing disease genes with de novo recurrence and compound heterozygosity. We also detect pathways enriched with candidate and known diagnostic genes. Our computational analysis, coupled with a systematic clinical review, recapitulated known diagnoses and revealed new disease associations. We further release a software package, RaMeDiES, enabling automated cross-analysis of deidentified sequenced cohorts for new diagnostic and research discoveries. Gene-level findings and variant-level information across the cohort are available in a public-facing browser (https://dbmi-bgm.github.io/udn-browser/). These results show that N-of-1 efforts should be supplemented by a joint genomic analysis across cohorts.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Undiagnosed Diseases Network cohort analysis.
(a) Map of clinical and research sites within the Undiagnosed Diseases Network (UDN) for evaluating patients and candidate variant functionality. (b) Genetic ancestry across the sequenced patient cohort. (c) Clinician-recorded primary symptom categories of patients. “Multiple” indicates 2+ categories could be considered primary and “other” indicates an unlisted category. Categories marked with an asterisk (*) are neurological subtypes (Supplementary Note S1). (d) Patient-reported age of first symptom onset. (e) Patient sex. (f) Categories and quantity of phenotype information collected for patients and made available to all UDN researchers (icons are from Microsoft PowerPoint). (g) Intronic variants detectable from genome sequencing (orange star) with a predicted splice-altering impact are considered alongside exonic variants in our statistical framework; these variants may result in retained introns or excised exons in processed transcripts. (h) We consider genes and gene pathways harboring de novo and compound heterozygous variants in sequenced trios (72% of cases). Complete case count by family structure (e.g., proband-only, duo) is in Supplementary Figure S2. Other inheritance modes (e.g., homozygous, uniparental disomy) are not considered in our cohort-based framework. (i) Depiction of clinical framework to uniformly evaluate how well a patient’s phenotypes are concordant with a candidate gene or variant.
Figure 2.
Figure 2.. De novo recurrence.
(a) De novo mutation counts per proband adjusted for parental ages. Blue vertical lines show the mean values of the distributions, and curves represent the Poisson fits. (b) Schematic of analytical test for the recurrence of de novos that considers distal splice-altering and exonic SNV and indel variants, their variant functionality scores, a genome-wide mutation rate model Roulette, and per-gene GeneBayes constraint values. “Like” variants refer to those of the same variant class (i.e., coding SNVs [CS], coding indels [CI], intronic SNVs [IS], intronic indels [II]) and within the same functionality score and minor allele frequency thresholds. (c) Genes with highest significance values for de novo recurrence across the cohort when focusing on missense variants with AlphaMissense and PrimateAI-3D scores; patients are represented as colored circles. Complete gene list can be found in Supplementary Table S2. (d) AlphaFold-predicted human LRRC7 protein structure (AF-Q96NW7-F1) covering the leucine-rich repeat region with high predicted structural confidence (amino acid positions 86–463). The fifth and eighth LRR domains where missense de novos were found are highlighted in blue. Reference alleles for missense de novo variants observed across two UDN patients (red) are shown in circles. A depiction of LRRC7’s linear protein sequence (Ensembl ID ENSP00000498937) with InterPro predicted domains shown in colored boxes is below. (e) Overlap of phenotype terms annotated to each patient.
Figure 3.
Figure 3.. Compound heterozygous variants.
(a) Illustration of the unnormalized squared mutational target computed for each observed comphet variant in a gene across the cohort (RaMeDiES-CH, Supplementary Figure S11) or in an individual across the genome (RaMeDiES-IND, Supplementary Figure S12). “Like” variants refer to those of the same variant class (i.e., coding SNVs [CS], coding indels [CI], intronic SNVs [IS], intronic indels [II]) and within the same functionality score and minor allele frequency thresholds. (b) Top ranked genes resulting in the best enrichment statistic computed for RaMeDiES-IND. Putative candidates refer to genes that remain candidates for pathogenicity due to their phenotypically-relevant tissue expression, but where there is not enough functional evidence or published gene–disease relationships to establish causality at this time. (c) Overlap between phenotypes associated with MED11 and those exhibited by the affected patient. (d) RNA-Seq reads from whole blood samples aligned to first two exons and first intron of MED11 for proband (black), dad (blue), mom (purple) and two tissue-matched control samples (gray). Thin green line represents the intron, solid boxes represent protein-coding exonic regions, and the dotted box represents the 5’ untranslated region of MED11. (e) Proband exhibits significant retention of the first intron relative to parents and fifty-three tissue-matched control samples. Intron retention ratio is calculated as the (median read depth of first intron) / (number of reads spanning first and second exons + median read depth of first intron).
Figure 4.
Figure 4.. Biological pathways enriched within phenotypically-similar patient subgroups.
(a) Schematic illustrating the two-step process of first clustering patients according to the semantic similarity of their phenotype terms and second finding enriched biological pathways among the genes within each patient cluster. (b) The most significant pathways per cluster (adjusted p-value < 0.01) with 1+ genes from 1+ undiagnosed patients; complete list in Supplementary Table S6. (c) Two patients with primarily immune-related symptoms each harbored a compelling de novo variant in genes involved in immunoproteasome assembly (POMP) and structure (PSMB8). Their symptoms strongly overlap, and a subset of these symptoms were also known to be associated with either gene in OMIM. (d) Three neurological patients had variants in transmembrane genes involved in the same pathway. These patients had substantial phenotypic overlap with each other, as expected, and with the phenotypes associated with each of their genes (depicted as star shapes in the upset plot).

References

    1. Kaplanis J. et al. Evidence for 28 genetic disorders discovered by combining healthcare and research data. Nature 586, 757–762 (2020). - PMC - PubMed
    1. 100,000 Genomes Project Pilot Investigators et al. 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. N. Engl. J. Med. 385, 1868–1880 (2021). - PMC - PubMed
    1. Marx J. L. The cystic fibrosis gene is found. Science 245, 923–925 (1989). - PubMed
    1. Roach J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010). - PMC - PubMed
    1. O’Roak B. J. et al. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat. Genet. 43, 585–589 (2011). - PMC - PubMed

Publication types