Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease
- PMID: 40113264
- PMCID: PMC12047269
- DOI: 10.1101/gr.279323.124
Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease
Abstract
Rare structural variants (SVs)-insertions, deletions, and complex rearrangements-can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore Technologies long-read genomes of 68 individuals from the undiagnosed disease network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4× increase from short reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably, these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that do not incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression toward improving the prioritization of functional SVs and TREs in rare disease patients.
© 2025 Jensen et al.; Published by Cold Spring Harbor Laboratory Press.
Figures
Update of
-
Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease.medRxiv [Preprint]. 2024 Mar 26:2024.03.22.24304565. doi: 10.1101/2024.03.22.24304565. medRxiv. 2024. Update in: Genome Res. 2025 Apr 14;35(4):914-928. doi: 10.1101/gr.279323.124. PMID: 38585781 Free PMC article. Updated. Preprint.
References
-
- Alazami AM, Patel N, Shamseldin HE, Anazi S, Al-Dosari MS, Alzahrani F, Hijazi H, Alshammari M, Aldahmesh MA., Salih MA, et al. 2015. Accelerating novel candidate gene discovery in neurogenetic disorders via whole-exome sequencing of prescreened multiplex consanguineous families. Cell Rep 10: 148–161. 10.1016/j.celrep.2014.12.015 - DOI - PubMed
MeSH terms
Grants and funding
- U24 HG010263/HG/NHGRI NIH HHS/United States
- R01 AG048076/AG/NIA NIH HHS/United States
- R21 HG013397/HG/NHGRI NIH HHS/United States
- U01 AG072573/AG/NIA NIH HHS/United States
- R35 AG072290/AG/NIA NIH HHS/United States
- U01 CA253481/CA/NCI NIH HHS/United States
- U01 HG010218/HG/NHGRI NIH HHS/United States
- U01 HG011762/HG/NHGRI NIH HHS/United States
- R01 AG074339/AG/NIA NIH HHS/United States
- U01 HG012069/HG/NHGRI NIH HHS/United States
- T32 HG000044/HG/NHGRI NIH HHS/United States
- R01 AG066490/AG/NIA NIH HHS/United States
- R35 GM139580/GM/NIGMS NIH HHS/United States
- R01 MH125244/MH/NIMH NIH HHS/United States
- R01 NS072248/NS/NINDS NIH HHS/United States
- U01 NS134358/NS/NINDS NIH HHS/United States
- P30 AG066515/AG/NIA NIH HHS/United States
- R03 CA272952/CA/NCI NIH HHS/United States
- S10 OD025082/OD/NIH HHS/United States
- OT2 OD034190/OD/NIH HHS/United States
LinkOut - more resources
Full Text Sources
Medical