Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Aug 4:2023.08.02.23293212.
doi: 10.1101/2023.08.02.23293212.

Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project

Sarah L Stenton  1   2   3 Melanie O'Leary  2 Gabrielle Lemire  1   2 Grace E VanNoy  2 Stephanie DiTroia  1   2 Vijay S Ganesh  1   2   4 Emily Groopman  1   2 Emily O'Heir  1   2 Brian Mangilog  2 Ikeoluwa Osei-Owusu  2 Lynn S Pais  1   2 Jillian Serrano  1   2 Moriel Singer-Berk  2 Ben Weisburd  2 Michael Wilson  2 Christina Austin-Tse  2   3 Marwa Abdelhakim  5 Azza Althagafi  5   6   7 Giulia Babbi  8 Riccardo Bellazzi  9   10 Samuele Bovo  11 Maria Giulia Carta  10 Rita Casadio  8 Pieter-Jan Coenen  12 Federica De Paoli  9 Matteo Floris  13 Manavalan Gajapathy  14   15   16 Robert Hoehndorf  5   6 Julius O B Jacobsen  17 Thomas Joseph  18 Akash Kamandula  19 Panagiotis Katsonis  20 Cyrielle Kint  12 Olivier Lichtarge  20   21   22 Ivan Limongelli  9 Yulan Lu  23 Paolo Magni  9   10 Tarun Karthik Kumar Mamidi  14   15   16 Pier Luigi Martelli  8 Marta Mulargia  13 Giovanna Nicora  9 Keith Nykamp  12 Vikas Pejaver  24   25 Yisu Peng  19 Thi Hong Cam Pham  26 Maurizio S Podda  13 Aditya Rao  18 Ettore Rizzo  9 Vangala G Saipradeep  18 Castrense Savojardo  8 Peter Schols  12 Yang Shen  27   28   29 Naveen Sivadasan  18 Damian Smedley  17 Dorian Soru  30 Rajgopal Srinivasan  18 Yuanfei Sun  27 Uma Sunderam  18 Wuwei Tan  27 Naina Tiwari  18 Xiao Wang  23 Yaqiong Wang  23 Amanda Williams  20 Elizabeth A Worthey  14   15   16 Rujie Yin  27 Yuning You  27 Daniel Zeiberg  19 Susanna Zucca  9 Constantina Bakolitsa  31 Steven E Brenner  31 Stephanie M Fullerton  32 Predrag Radivojac  19 Heidi L Rehm  2   3 Anne O'Donnell-Luria  1   2   3
Affiliations

Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project

Sarah L Stenton et al. medRxiv. .

Update in

  • Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project.
    Stenton SL, O'Leary MC, Lemire G, VanNoy GE, DiTroia S, Ganesh VS, Groopman E, O'Heir E, Mangilog B, Osei-Owusu I, Pais LS, Serrano J, Singer-Berk M, Weisburd B, Wilson MW, Austin-Tse C, Abdelhakim M, Althagafi A, Babbi G, Bellazzi R, Bovo S, Carta MG, Casadio R, Coenen PJ, De Paoli F, Floris M, Gajapathy M, Hoehndorf R, Jacobsen JOB, Joseph T, Kamandula A, Katsonis P, Kint C, Lichtarge O, Limongelli I, Lu Y, Magni P, Mamidi TKK, Martelli PL, Mulargia M, Nicora G, Nykamp K, Pejaver V, Peng Y, Pham THC, Podda MS, Rao A, Rizzo E, Saipradeep VG, Savojardo C, Schols P, Shen Y, Sivadasan N, Smedley D, Soru D, Srinivasan R, Sun Y, Sunderam U, Tan W, Tiwari N, Wang X, Wang Y, Williams A, Worthey EA, Yin R, You Y, Zeiberg D, Zucca S, Bakolitsa C, Brenner SE, Fullerton SM, Radivojac P, Rehm HL, O'Donnell-Luria A. Stenton SL, et al. Hum Genomics. 2024 Apr 29;18(1):44. doi: 10.1186/s40246-024-00604-w. Hum Genomics. 2024. PMID: 38685113 Free PMC article.

Abstract

Background: A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting.

Methods: Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds.

Results: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant, in an unsolved proband with phenotype overlap with asparagine synthetase deficiency.

Conclusions: By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.

Keywords: Best practices; Genome interpretation; Genome sequencing; Rare disease; Variant prioritization.

PubMed Disclaimer

Conflict of interest statement

Competing interests. Authors S.Z., I.L., E.R., P.M., and R.B., own shares of enGenome srl. Authors F.D.P. and G.N. are employees of enGenome srl. Authors T.J., R.S., S.G.V., N.S., A.R., U.S., N.T., are employees of TCS Ltd. Authors P.J.C., C.K., K.N., and P.S. are employees of Invitae Ltd. H.L.R. receives support from Illumina and Microsoft for rare disease gene discovery and diagnosis. A.O’D-L. is a member of the scientific advisory board for Congenica Inc and the Simons Foundation SPARK for Autism study and co-chairs the clinical advisory board for CAGI. S.E.B receives support at UC Berkeley from a research agreement from TCS. All other authors report no competing interests.

Figures

Figure 1.
Figure 1.. CAGI6 RGP challenge overview of selected families.
Summary of the 35 training set families (all solved) and 30 test set families (14 solved, 16 unsolved). Imputed population ancestry, the amount of familial sequencing data provided (proband-only, duo, trio, or quad), diagnostic status, and mode of inheritance of the causal variant(s) is displayed by family. For all returnable diagnostic variants in the solved families in each set, the functional consequence according to the Variant Effect Predictor (VEP), ClinVar and HGMD reporting status at the time of announcement of the challenge (May 3, 2021), and ACMG/AMP classification are displayed by variant. NFE, Non-Finnish European; AFR, African/African American; AMR, Admixed American; ASJ, Ashkenazi Jewish; SAS, South Asian; AD, autosomal dominant; XLR, X-linked recessive; AR, autosomal recessive; P, pathogenic; LP, likely pathogenic; VUS, variant of uncertain significance; DM, disease mutation.
Figure 2.
Figure 2.. Results of assessment using the 14 solved families (true positives).
A. Number of true positive diagnoses (y-axis) Identified per model (x-axis) colored by the rank position of the causal variants in the 14 solved probands. Models are ordered by their performance according to the mean rank points metric (Table 2). Team names are provided except for teams that elected to remain anonymous. B. Results of the mean rank points and F-max value numeric assessment metrics by team and model. Model 1, the primary model, for each team is indicated by the grey fill. C, Performance of models, according to the mean rank points awarded, comparing families with proband-only or duo data (i.e., an incomplete trio/quad) versus trio or quad data (i.e., a complete trio/quad).
Figure 3.
Figure 3.. Concordance in the variant predictions submitted by top five performing teams in the solved and unsolved families.
Venn diagrams demonstrating the overlap in the variant predictions submitted across all probands in the solved families (left) compared to the unsolved families (right) between top performing teams.
Figure 4.
Figure 4.
Confirmatory RNA sequencing in P1 and P3. For both A and B, in the top panel, paired end reads from the RNA sequencing BAM file are displayed for the proband. In the lower panels, the RNA sequencing read pileup tract is displayed with the novel (orange) and known (blue) junctions annotated in the proband and in aggregated data from GTEx controls, respectively. Beneath, the gene transcript isoforms are displayed. A, RNA sequencing analysis performed on blood in P1 compared to normalized GTEx blood samples (n=755) (48). The results for ASNS (displaying exon 9 and 10) demonstrate evidence of splice disruption due to a deep intronic indel (indicated by the red box in the proband) with cryptic exon creation and intron 9 read-through. B, RNA sequencing analysis performed on an EBV-transformed lymphoblastoid cell line (LCL) in P3 compared to normalized GTEx lymphocyte samples (n=174). The results for TCF4 (displaying exon 10 to 13) demonstrate evidence of splice disruption due to a near-splice variant (indicated by the red line in the proband) with skipping of exon 11 in approximately 20% of reads. E, exon.

References

    1. Splinter K, Adams DR, Bacino CA, Bellen HJ, Bernstein JA, Cheatle-Jarvela AM, et al. Effect of Genetic Diagnosis on Patients with Previously Undiagnosed Disease. N Engl J Med. 2018. Nov 29;379(22):2131–9. - PMC - PubMed
    1. 100,000 Genomes Project Pilot Investigators, Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, et al. 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. N Engl J Med. 2021. Nov 11;385(20):1868–80. - PMC - PubMed
    1. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020. May;581(7809):434–43. - PMC - PubMed
    1. Rehm HL. Evolving health care through personal genomics. Nat Rev Genet. 2017. Apr;18(4):259–67. - PMC - PubMed
    1. Clark MM, Stark Z, Farnaes L, Tan TY, White SM, Dimmock D, et al. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases. NPJ Genom Med. 2018. Jul 9;3:16. - PMC - PubMed

Publication types