Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 29;18(1):44.
doi: 10.1186/s40246-024-00604-w.

Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project

Sarah L Stenton  1   2   3 Melanie C O'Leary  2 Gabrielle Lemire  1   2 Grace E VanNoy  2 Stephanie DiTroia  1   2 Vijay S Ganesh  1   2   4 Emily Groopman  1   2 Emily O'Heir  1   2 Brian Mangilog  2 Ikeoluwa Osei-Owusu  2 Lynn S Pais  1   2 Jillian Serrano  1   2 Moriel Singer-Berk  2 Ben Weisburd  2 Michael W Wilson  2 Christina Austin-Tse  2   3 Marwa Abdelhakim  5   6 Azza Althagafi  5   6   7 Giulia Babbi  8 Riccardo Bellazzi  9   10 Samuele Bovo  11 Maria Giulia Carta  10 Rita Casadio  8 Pieter-Jan Coenen  12   13 Federica De Paoli  9 Matteo Floris  14 Manavalan Gajapathy  15   16   17 Robert Hoehndorf  5   6 Julius O B Jacobsen  18 Thomas Joseph  19 Akash Kamandula  20 Panagiotis Katsonis  21 Cyrielle Kint  12 Olivier Lichtarge  21   22   23 Ivan Limongelli  9 Yulan Lu  24 Paolo Magni  10 Tarun Karthik Kumar Mamidi  15   16   17 Pier Luigi Martelli  8 Marta Mulargia  14 Giovanna Nicora  9   10 Keith Nykamp  12 Vikas Pejaver  25   26 Yisu Peng  20 Thi Hong Cam Pham  27 Maurizio S Podda  14   28   29   30 Aditya Rao  19 Ettore Rizzo  9 Vangala G Saipradeep  19 Castrense Savojardo  8 Peter Schols  12   13 Yang Shen  31   32   33 Naveen Sivadasan  19 Damian Smedley  18 Dorian Soru  34 Rajgopal Srinivasan  19 Yuanfei Sun  31 Uma Sunderam  19 Wuwei Tan  31 Naina Tiwari  19 Xiao Wang  24 Yaqiong Wang  24 Amanda Williams  21 Elizabeth A Worthey  15   16   17 Rujie Yin  31 Yuning You  31 Daniel Zeiberg  20 Susanna Zucca  9 Constantina Bakolitsa  35 Steven E Brenner  35 Stephanie M Fullerton  36 Predrag Radivojac  20 Heidi L Rehm  2   3 Anne O'Donnell-Luria  37   38   39
Affiliations

Critical assessment of variant prioritization methods for rare disease diagnosis within the rare genomes project

Sarah L Stenton et al. Hum Genomics. .

Abstract

Background: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting.

Methods: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values.

Results: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency.

Conclusions: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.

Keywords: Best practices; Genome interpretation; Genome sequencing; Rare disease; Variant prioritization.

PubMed Disclaimer

Conflict of interest statement

Authors S.Z., I.L., E.R., P.M., and R.B., own shares of enGenome srl. Authors F.D.P. and G.N. are employees of enGenome srl. Authors T.J., R.S., S.G.V., N.S., A.R., U.S., N.T., are employees of TCS Ltd. Authors P.J.C., C.K., K.N., and P.S. are employees of Invitae Ltd. H.L.R. receives support from Illumina and Microsoft for rare disease gene discovery and diagnosis. A.O’D-L. is a member of the scientific advisory board for Congenica Inc and chairs the clinical advisory board for CAGI. S.E.B receives support at UC Berkeley from a research agreement from TCS. All other authors report no competing interests.

Figures

Fig. 1
Fig. 1
CAGI6-RGP challenge overview of selected families. Summary of the 35 training set families (all solved) and 30 test set families (14 solved, 16 unsolved). Imputed population ancestry, the amount of familial sequencing data provided (proband-only, duo, trio, or quad), diagnostic status, and mode of inheritance of the causal variant(s) is displayed by family. For all returnable diagnostic variants in the solved families in each set, the functional consequence according to the Variant Effect Predictor (VEP), ClinVar and HGMD reporting status at the time of announcement of the challenge (May 3, 2021), and ACMG/AMP classification are displayed by variant. NFE, Non-Finnish European; AFR, African/African American; AMR, Admixed American; ASJ, Ashkenazi Jewish; SAS, South Asian; AD, autosomal dominant; XLR, X-linked recessive; AR, autosomal recessive; P, pathogenic; LP, likely pathogenic; VUS, variant of uncertain significance; DM, disease mutation
Fig. 2
Fig. 2
Results of assessment using the 14 solved families (true positives). A Number of true positive diagnoses (y-axis) identified per model (x-axis) colored by the rank position of the causal variants in the 14 solved probands. Models are ordered by their performance according to the mean rank points metric (Table 2). Team names are provided except for teams that elected to remain anonymous. B Results of the mean rank points and F-max value numeric assessment metrics by team and model. Model 1, the primary model, for each team is indicated by the grey fill. C, Performance of models, according to the mean rank points awarded, comparing families with proband-only or duo data (i.e., an incomplete trio/quad) versus trio or quad data (i.e., a complete trio/quad)
Fig. 3
Fig. 3
Concordance in the variant predictions submitted by top five performing teams in the solved and unsolved families. Venn diagrams demonstrating the overlap in the variant predictions submitted across all probands in the solved families (left) compared to the unsolved families (right) between top performing teams
Fig. 4
Fig. 4
Confirmatory RNA sequencing in P1 and P3. For both A and B, in the top panel, paired end reads from the RNA sequencing BAM file are displayed for the proband. In the lower panels, the RNA sequencing read pileup tract is displayed with the novel (orange) and known (blue) junctions annotated in the proband and in aggregated data from GTEx controls, respectively. Beneath, the gene transcript isoforms are displayed. A, RNA sequencing analysis performed on blood in P1 compared to normalized GTEx blood samples (n = 755) (21). The results for ASNS (displaying exon 9 and 10) demonstrate evidence of splice disruption due to a deep intronic indel (indicated by the red box in the proband) with cryptic exon creation and intron 9 read-through. B, RNA sequencing analysis performed on an EBV-transformed lymphoblastoid cell line (LCL) in P3 compared to normalized GTEx lymphocyte samples (n = 174). The results for TCF4 (displaying exon 10 to 13) demonstrate evidence of splice disruption due to a near-splice variant (indicated by the red line in the proband) with skipping of exon 11 in approximately 20% of reads. E, exon

Update of

  • Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project.
    Stenton SL, O'Leary M, Lemire G, VanNoy GE, DiTroia S, Ganesh VS, Groopman E, O'Heir E, Mangilog B, Osei-Owusu I, Pais LS, Serrano J, Singer-Berk M, Weisburd B, Wilson M, Austin-Tse C, Abdelhakim M, Althagafi A, Babbi G, Bellazzi R, Bovo S, Carta MG, Casadio R, Coenen PJ, De Paoli F, Floris M, Gajapathy M, Hoehndorf R, Jacobsen JOB, Joseph T, Kamandula A, Katsonis P, Kint C, Lichtarge O, Limongelli I, Lu Y, Magni P, Mamidi TKK, Martelli PL, Mulargia M, Nicora G, Nykamp K, Pejaver V, Peng Y, Pham THC, Podda MS, Rao A, Rizzo E, Saipradeep VG, Savojardo C, Schols P, Shen Y, Sivadasan N, Smedley D, Soru D, Srinivasan R, Sun Y, Sunderam U, Tan W, Tiwari N, Wang X, Wang Y, Williams A, Worthey EA, Yin R, You Y, Zeiberg D, Zucca S, Bakolitsa C, Brenner SE, Fullerton SM, Radivojac P, Rehm HL, O'Donnell-Luria A. Stenton SL, et al. medRxiv [Preprint]. 2023 Aug 4:2023.08.02.23293212. doi: 10.1101/2023.08.02.23293212. medRxiv. 2023. Update in: Hum Genomics. 2024 Apr 29;18(1):44. doi: 10.1186/s40246-024-00604-w. PMID: 37577678 Free PMC article. Updated. Preprint.

References

    1. Splinter K, Adams DR, Bacino CA, Bellen HJ, Bernstein JA, Cheatle-Jarvela AM, et al. Effect of genetic diagnosis on patients with previously undiagnosed disease. N Engl J Med. 2018;379(22):2131–2139. doi: 10.1056/NEJMoa1714458. - DOI - PMC - PubMed
    1. 100,000 Genomes Project Pilot Investigators. Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, et al. 100,000 Genomes pilot on rare-disease diagnosis in health care - preliminary report. N Engl J Med. 2021;385(20):1868–1880. doi: 10.1056/NEJMoa2035790. - DOI - PMC - PubMed
    1. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581(7809):434–443. doi: 10.1038/s41586-020-2308-7. - DOI - PMC - PubMed
    1. Rehm HL. Evolving health care through personal genomics. Nat Rev Genet. 2017;18(4):259–267. doi: 10.1038/nrg.2016.162. - DOI - PMC - PubMed
    1. Clark MM, Stark Z, Farnaes L, Tan TY, White SM, Dimmock D, et al. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases. NPJ Genom Med. 2018;9(3):16. doi: 10.1038/s41525-018-0053-8. - DOI - PMC - PubMed

Publication types