Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan;23(1):69-79.
doi: 10.1038/s41436-020-00972-3. Epub 2020 Oct 13.

Disease-specific variant pathogenicity prediction significantly improves variant interpretation in inherited cardiac conditions

Affiliations

Disease-specific variant pathogenicity prediction significantly improves variant interpretation in inherited cardiac conditions

Xiaolei Zhang et al. Genet Med. 2021 Jan.

Abstract

Purpose: Accurate discrimination of benign and pathogenic rare variation remains a priority for clinical genome interpretation. State-of-the-art machine learning variant prioritization tools are imprecise and ignore important parameters defining gene-disease relationships, e.g., distinct consequences of gain-of-function versus loss-of-function variants. We hypothesized that incorporating disease-specific information would improve tool performance.

Methods: We developed a disease-specific variant classifier, CardioBoost, that estimates the probability of pathogenicity for rare missense variants in inherited cardiomyopathies and arrhythmias. We assessed CardioBoost's ability to discriminate known pathogenic from benign variants, prioritize disease-associated variants, and stratify patient outcomes.

Results: CardioBoost has high global discrimination accuracy (precision recall area under the curve [AUC] 0.91 for cardiomyopathies; 0.96 for arrhythmias), outperforming existing tools (4-24% improvement). CardioBoost obtains excellent accuracy (cardiomyopathies 90.2%; arrhythmias 91.9%) for variants classified with >90% confidence, and increases the proportion of variants classified with high confidence more than twofold compared with existing tools. Variants classified as disease-causing are associated with both disease status and clinical severity, including a 21% increased risk (95% confidence interval [CI] 11-29%) of severe adverse outcomes by age 60 in patients with hypertrophic cardiomyopathy.

Conclusions: A disease-specific variant classifier outperforms state-of-the-art genome-wide tools for rare missense variants in inherited cardiac conditions ( https://www.cardiodb.org/cardioboost/ ), highlighting broad opportunities for improved pathogenicity prediction through disease specificity.

Keywords: Brugada syndrome; cardiomyopathy; long QT syndrome; missense variant interpretation; pathogenicity prediction.

PubMed Disclaimer

Conflict of interest statement

S.A.C. is a cofounder and director of Enleofen Bio Pte Ltd, a company that develops anti-IL-11 therapeutics. Enleofen Bio had no involvement in this study. J.S.W. and I.O. have consulted for Myokardia, Inc. The ShaRe registry receives research support from MyoKardia. Myokardia had no involvement in this study. The other authors declare no conflicts of interest.

Figures

Fig. 1
Fig. 1. Training and testing of CardioBoost, and definition of high-confidence variant classification thresholds for performance assessment.
(a) Construction of CardioBoost: (1) After defining gold standard data, (2) the data set was split with a 2:1 proportion into training and test tests. The training set was used for two rounds of cross-validation (CV): first to optimize individually a number of possible machine learning algorithms, and second to select the best-performing tool. (3) AdaBoost was the best-performing algorithm, and forms the basis of CardioBoost. (4) CardioBoost was benchmarked against existing best-in-class tools using the holdout test data, (5) a number of additional independent test sets, and (6) approaches based on association with clinical characteristics of heterozygotes that do not rely on a gold standard classification. (b) Illustrative distributions of predicted pathogenicity scores for a set of pathogenic and benign variants obtained by a hypothetical binary classifier. In a clinical context (based on American College of Medical Genetics and Genomics/Association for Molecular Pathology [ACMG/AMP] guidelines), variants are classified into the following categories according to the probability of pathogenicity: disease-causing (probability of pathogenicity [Pr] ≥0.9), benign/likely benign (Pr ≤ 0.1) and a clinically indeterminate group of variants of uncertain significance with low interpretative confidence (0.1 < Pr < 0.9). (c) The corresponding confusion matrix with the defined double classification thresholds Pr ≥ 0.9 and Pr ≤ 0.1.
Fig. 2
Fig. 2. CardioBoost outperforms state-of-the-art genome-wide prediction tools on holdout test data.
(a, b) Precision recall curves and receiver operating characteristic (ROC) curves for cardiomyopathy variant pathogenicity prediction. (c, d) Precision recall curves and ROC curves for inherited arrhythmia variant pathogenicity prediction. The dashed lines demonstrate the performance of a random classifier.
Fig. 3
Fig. 3. CardioBoost improves prioritization of variants associated with disease and clinical outcomes in patients with hypertrophic cardiomyopathy (HCM).
(a) We compared the odds ratios (ORs) (on log scale) for three groups of variants: (i) all rare variants, (ii) rare variants predicted disease-causing by CardioBoost (Pr ≥0.9, and excluding those seen in our training data), and (iii) rare variants predicted as benign by CardioBoost (Pr ≤ 0.1, and excluding those seen in our training data). For most of the sarcomere-encoding genes, variants classified as disease-causing by CardioBoost are enriched for disease association, and those classified as benign are depleted, compared with unstratified rare missense variants. (bd) CardioBoost variant classification stratifies key clinical outcomes in patients with HCM. Clinical outcomes provide an opportunity to assess classifier performance independent of the labels used in the gold standard training data. (b) Kaplan–Meier event-free survival curves are shown for patients in the SHaRe cardiomyopathy registry, stratified by genotype as interpreted by CardioBoost. The patients carrying variants seen in the CardioBoost training set were excluded from this analysis. Patients with predicted disease-causing variants in sarcomere-encoding genes have more adverse clinical events compared with patients without sarcomere-encoding variants (“genotype-negative”), and compared with patients with sarcomere-encoding variants classified as benign. Survival curves stratified by variants as adjudicated by experts (marked in figure with prefix “SHaRe”) are shown for comparison. The composite endpoint comprised the first incidence of any component of the ventricular arrhythmic or heart failure composite endpoint, atrial fibrillation, stroke or death. (c) P values of the log-rank test in the pairwise comparisons of Kaplan–Meier survival curves. (d) Forest plot displays the hazard ratio (with confidence interval) and P value of tests comparing patients’ survival stratified by CardioBoost classification and SHaRe experts’ classification based on Cox proportional hazards models.

References

    1. Richards S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17:405–423. doi: 10.1038/gim.2015.30. - DOI - PMC - PubMed
    1. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4:1073–1082. doi: 10.1038/nprot.2009.86. - DOI - PubMed
    1. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet. 2013;76:7.20.1–7.20.41. - PMC - PubMed
    1. Schwarz JM, Cooper DN, Schuelke M, Seelow D. Mutationtaster2: mutation prediction for the deep-sequencing age. Nat Methods. 2014;11:361–362. doi: 10.1038/nmeth.2890. - DOI - PubMed
    1. Kircher M, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46:310–315. doi: 10.1038/ng.2892. - DOI - PMC - PubMed

Publication types