Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 May 21:2025.05.19.25327921.
doi: 10.1101/2025.05.19.25327921.

Scalable automated reanalysis of genomic data in research and clinical rare disease cohorts

Affiliations

Scalable automated reanalysis of genomic data in research and clinical rare disease cohorts

Matthew J Welland et al. medRxiv. .

Abstract

Reanalysis of genomic data in rare disease is highly effective in increasing diagnostic yields but remains limited by manual approaches. Automation and optimization for high specificity will be necessary to ensure scalability, adoption and sustainability of iterative reanalysis. We developed a publicly available automated tool, Talos, and validated its performance using data from 1,089 individuals with rare genetic disease. Trio-based analysis identified 86% of known in-scope diagnoses, returning one variant per case on average. Variant burden reduced to one variant per 200 cases on iterative monthly reanalysis cycles. Application to an unselected cohort of 4,735 undiagnosed individuals identified 248 diagnoses (5.2% yield): 73 (29%) due to new gene-disease relationships, 56 (23%) due to new variant-level evidence, and 119 (48%) due to improved filtering and analysis strategies. Our automated, iterative reanalysis model, applied to thousands of rare disease patients, demonstrates the feasibility of delivering frequent, systematic reanalysis at scale.

PubMed Disclaimer

Conflict of interest statement

Competing interests The other authors have no conflicts of interest to declare.

Figures

Figure 1.
Figure 1.. Overview of the Talos workflow,
including required inputs (genomic data, metadata and annotation resources); key components of the variant prioritization algorithm; and outputs for manual review and result return. The variant filtering and prioritization stage uses a variant tagging and filtering process to select variants likely to be classified as disease-causing using ACMG/AMP criteria. When data are available, the family and phenotype modules filter based on concordant mode of inheritance and can filter or prioritize variants based on phenotype. (PM5 refers to ACMG/AMP criteria for evidence derived from amino acid substitutions at the same position; SV: structural variant; LOF: loss of function).
Figure 2.
Figure 2.. Talos performance in validation cohorts used for development and testing.
(A). Cohort characteristics of the Acute Care Genomics (ACG) and Rare Genomes Project (RGP) cohorts. (B). Performance of Talos compared with results of manual analysis in the clinical (ACG) and research (RGP) settings.
Figure 3.
Figure 3.. Results from iterative reanalysis using Talos.
(a). Characteristics of the cohort undergoing iterative reanalysis, including common reasons for referral, age at time of the original test, and testing modality (b). Sources of new diagnoses in the full cohort and in the three main sub-cohorts, each tile represents a diagnosis (NDD: neurodevelopmental, MOI: mode of inheritance, QC: quality control, CNV: copy number variant, SV: structural variant).
Figure 4.
Figure 4.. Automated reanalysis program timelines.
(a) Data aggregation timeline showing the entry of exome and genome data into the program between 2023 and 2025. (b) Diagnostic yield based on year of original analysis, including reason for reanalysis diagnosis. Each tile represents a diagnosis. (c) Timelines of three illustrative cases, demonstrating the short gap between new information becoming available and reanalysis diagnosis through Talos (highlighted in orange). CNV: copy number variant, SV: structural variant, WGS: whole genome sequencing, AD: autosomal dominant.

References

    1. Lunke S., et al. Integrated multi-omics for rapid rare disease diagnosis on a national scale. Nat Med 29, 1681–1691 (2023). - PMC - PubMed
    1. Wojcik M.H., et al. Genome Sequencing for Diagnosing Rare Diseases. N Engl J Med 390, 1985–1997 (2024). - PMC - PubMed
    1. Wright C.F., et al. Genomic Diagnosis of Rare Pediatric Disease in the United Kingdom and Ireland. N Engl J Med (2023). - PMC - PubMed
    1. Chung C.C.Y., et al. Meta-analysis of the diagnostic and clinical utility of exome and genome sequencing in pediatric and adult patients with rare diseases across diverse populations. Genet Med 25, 100896 (2023). - PubMed
    1. Dai P., et al. Recommendations for next generation sequencing data reanalysis of unsolved cases with suspected Mendelian disorders: A systematic review and meta-analysis. Genet Med 24, 1618–1629 (2022). - PubMed

Publication types

LinkOut - more resources