This is a preprint.
Scalable automated reanalysis of genomic data in research and clinical rare disease cohorts
- PMID: 40661289
- PMCID: PMC12258758
- DOI: 10.1101/2025.05.19.25327921
Scalable automated reanalysis of genomic data in research and clinical rare disease cohorts
Abstract
Reanalysis of genomic data in rare disease is highly effective in increasing diagnostic yields but remains limited by manual approaches. Automation and optimization for high specificity will be necessary to ensure scalability, adoption and sustainability of iterative reanalysis. We developed a publicly available automated tool, Talos, and validated its performance using data from 1,089 individuals with rare genetic disease. Trio-based analysis identified 86% of known in-scope diagnoses, returning one variant per case on average. Variant burden reduced to one variant per 200 cases on iterative monthly reanalysis cycles. Application to an unselected cohort of 4,735 undiagnosed individuals identified 248 diagnoses (5.2% yield): 73 (29%) due to new gene-disease relationships, 56 (23%) due to new variant-level evidence, and 119 (48%) due to improved filtering and analysis strategies. Our automated, iterative reanalysis model, applied to thousands of rare disease patients, demonstrates the feasibility of delivering frequent, systematic reanalysis at scale.
Conflict of interest statement
Competing interests The other authors have no conflicts of interest to declare.
Figures




References
-
- Chung C.C.Y., et al. Meta-analysis of the diagnostic and clinical utility of exome and genome sequencing in pediatric and adult patients with rare diseases across diverse populations. Genet Med 25, 100896 (2023). - PubMed
-
- Dai P., et al. Recommendations for next generation sequencing data reanalysis of unsolved cases with suspected Mendelian disorders: A systematic review and meta-analysis. Genet Med 24, 1618–1629 (2022). - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources