Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 21;18(1):28.
doi: 10.1186/s40246-024-00595-8.

Explicable prioritization of genetic variants by integration of rule-based and machine learning algorithms for diagnosis of rare Mendelian disorders

Affiliations

Explicable prioritization of genetic variants by integration of rule-based and machine learning algorithms for diagnosis of rare Mendelian disorders

Ho Heon Kim et al. Hum Genomics. .

Abstract

Background: In the process of finding the causative variant of rare diseases, accurate assessment and prioritization of genetic variants is essential. Previous variant prioritization tools mainly depend on the in-silico prediction of the pathogenicity of variants, which results in low sensitivity and difficulty in interpreting the prioritization result. In this study, we propose an explainable algorithm for variant prioritization, named 3ASC, with higher sensitivity and ability to annotate evidence used for prioritization. 3ASC annotates each variant with the 28 criteria defined by the ACMG/AMP genome interpretation guidelines and features related to the clinical interpretation of the variants. The system can explain the result based on annotated evidence and feature contributions.

Results: We trained various machine learning algorithms using in-house patient data. The performance of variant ranking was assessed using the recall rate of identifying causative variants in the top-ranked variants. The best practice model was a random forest classifier that showed top 1 recall of 85.6% and top 3 recall of 94.4%. The 3ASC annotates the ACMG/AMP criteria for each genetic variant of a patient so that clinical geneticists can interpret the result as in the CAGI6 SickKids challenge. In the challenge, 3ASC identified causal genes for 10 out of 14 patient cases, with evidence of decreased gene expression for 6 cases. Among them, two genes (HDAC8 and CASK) had decreased gene expression profiles confirmed by transcriptome data.

Conclusions: 3ASC can prioritize genetic variants with higher sensitivity compared to previous methods by integrating various features related to clinical interpretation, including features related to false positive risk such as quality control and disease inheritance pattern. The system allows interpretation of each variant based on the ACMG/AMP criteria and feature contribution assessed using explainable AI techniques.

Keywords: Clinical genome interpretation; Explainable AI; Mendelian disorder; Variant prioritization.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Overview of 3ASC variant prioritization system. The Bayesian score, symptom similarity, the 3Cnet score, and false risk features were trained to build different 3ASC models including the baseline model, logistic regression models, and random forest models. 3ASC prioritizes variants of each patient according to the scores, and annotates ACMG criteria to each variant. The user can interpret the prioritization result based on the ACMG rules and feature contribution for each variant
Fig. 2
Fig. 2
Performance comparison between 3ASC models using cross-validation. A Average ROC curve. True positive rates from different folds were averaged for each false positive rate using interpolation. False positive rates ranging from 0 to 0.1 were plotted because the true positive rates were mostly saturated afterwards. B Average PR curve. C Average top-k recall
Fig. 3
Fig. 3
Top-k recall comparison with benchmark models using external validation. The same set of genes and variants were used to compare the performance of variant prioritization without any bias. For Exomiser and LIRICAL, gene scores were first used to prioritize the most probable causal genes and then variants were prioritized using variant scores
Fig. 4
Fig. 4
Feature importance measured using SHAP and MDA. A SHAP force plot of 3ASC_RF_ALL; B Feature importance of 3ASC_RF_ALL
Fig. 5
Fig. 5
SHAP plot of Individual variants in a patient with hemophilia A. A Force plot of a false call variant; B Force plot of a confirmative variant

Similar articles

Cited by

References

    1. Haendel M, Vasilevsky N, Unni D, Bologa C, Harris N, Rehm H, et al. How many rare diseases are there? Nat Rev Drug Discov. 2020;19:77–78. doi: 10.1038/d41573-019-00180-y. - DOI - PMC - PubMed
    1. Jacobsen JOB, Kelly C, Cipriani V, Mungall CJ, Reese J, et al. Phenotype-driven approaches to enhance variant prioritization and diagnosis of rare disease. Hum Mutat. 2022;43(8):1071–1081. doi: 10.1002/humu.24380. - DOI - PMC - PubMed
    1. Splinter K, Adams DR, Bacino CA, Bellen HJ, Bernstein JA, Cheatle-Jarvela AM, et al. Effect of genetic diagnosis on patients with previously undiagnosed disease. N Engl J Med. 2018;379:2131–2139. doi: 10.1056/NEJMoa1714458. - DOI - PMC - PubMed
    1. Liu X, Jian X, Boerwinkle E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat. 2011;32:894–899. doi: 10.1002/humu.21517. - DOI - PMC - PubMed
    1. Kim HH, Woo J, Kim D-W, Lee J, Seo GH, Lee H, et al. Disease-causing variant recommendation system for clinical genome interpretation with adjusted scores for artefactual variants. bioRxiv [Internet]. 2022; Available from: https://www.biorxiv.org/content/early/2022/10/14/2022.10.12.511857

Publication types