. 2022 Jul 29;14(1):81.

doi: 10.1186/s13073-022-01078-y.

X-CAP improves pathogenicity prediction of stopgain variants

Ruchir Rastogi¹, Peter D Stenson², David N Cooper², Gill Bejerano^{3

4

5

6}

Affiliations

¹ Department of Computer Science, Stanford University, Stanford, USA.
² Institute of Medical Genetics, Cardiff University, Cardiff, UK.
³ Department of Computer Science, Stanford University, Stanford, USA. bejerano@stanford.edu.
⁴ Department of Developmental Biology, Stanford University, Stanford, USA. bejerano@stanford.edu.
⁵ Department of Pediatrics, Stanford University, Stanford, USA. bejerano@stanford.edu.
⁶ Department of Biomedical Data Science, Stanford University, Stanford, USA. bejerano@stanford.edu.

PMID: 35906703
PMCID: PMC9338606
DOI: 10.1186/s13073-022-01078-y

X-CAP improves pathogenicity prediction of stopgain variants

Ruchir Rastogi et al. Genome Med. 2022.

. 2022 Jul 29;14(1):81.

doi: 10.1186/s13073-022-01078-y.

Authors

Ruchir Rastogi¹, Peter D Stenson², David N Cooper², Gill Bejerano^{3

4

5

6}

Affiliations

¹ Department of Computer Science, Stanford University, Stanford, USA.
² Institute of Medical Genetics, Cardiff University, Cardiff, UK.
³ Department of Computer Science, Stanford University, Stanford, USA. bejerano@stanford.edu.
⁴ Department of Developmental Biology, Stanford University, Stanford, USA. bejerano@stanford.edu.
⁵ Department of Pediatrics, Stanford University, Stanford, USA. bejerano@stanford.edu.
⁶ Department of Biomedical Data Science, Stanford University, Stanford, USA. bejerano@stanford.edu.

PMID: 35906703
PMCID: PMC9338606
DOI: 10.1186/s13073-022-01078-y

Abstract

Stopgain substitutions are the third-largest class of monogenic human disease mutations and often examined first in patient exomes. Existing computational stopgain pathogenicity predictors, however, exhibit poor performance at the high sensitivity required for clinical use. Here, we introduce a new classifier, termed X-CAP, which uses a novel training methodology and unique feature set to improve the AUROC by 18% and decrease the false-positive rate 4-fold on large variant databases. In patient exomes, X-CAP prioritizes causal stopgains better than existing methods do, further illustrating its clinical utility. X-CAP is available at https://github.com/bejerano-lab/X-CAP .

Keywords: Machine learning; Nonsense; Pathogenicity prediction; Stopgain.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Stopgains are a sizable variant class. a The number of variants of each mutation type as a proportion of all DM (disease-causing) variants in HGMD 2020.1. Single base-pair stopgains are the third-largest class, trailing only missense variants and frameshift indels. b The prevalence of stopgains from Phase 3 of the 1000 Genomes Project (N=2504) as a function of their allele frequencies within the same dataset. The average individual in the dataset harbors 12.5 stopgains with an allele frequency of less than 1%

**Fig. 2**
X-CAP features show predictive power. Comparison of feature values for benign and pathogenic stopgains in the training set of $D_{original}$ . a The Residual Variation Intoleration Score (RVIS) decile of genes, weighted by the number of variants they contain. Genes without RVIS values were excluded. Pathogenic variants are more prevalent in low RVIS genes, namely those generally intolerant to variation. b Kernel Density Estimation (KDE) plot of the relative variant location, defined as the distance in the coding domain sequence (CDS) from the translation start site divided by the total CDS length. On average, benign stopgains are located later in transcripts than pathogenic stopgains. c KDE plot of the number of exons in the mutated gene. The maximum number of exons is clipped to 100 for clarity. Genes containing benign stopgains tend to have fewer exons than genes containing pathogenic stopgains. d Odds ratios (pathogenic/benign) comparing variants that introduce a given stop codon to those that do not. The TGA stop codon, molecularly shown to be the most amenable to read-through of the three [36], is depleted in pathogenic variants. e Odds ratios comparing 5’ proximal stopgains (those within the first 100 bp of the sequence) that have a potential alternative downstream start codon a given distance away against those that do not. Pathogenic variants tend to be located further from the next downstream start codon than benign variants. f KDE plot of the mean phyloP of the downstream region, the portion of the CDS truncated by the stopgain. Regions downstream of pathogenic variants are more conserved than regions downstream of benign variants. In b, c, and f, Scott’s Rule [52] was used to calculate the bandwidth of the Gaussian kernel. In d and e, error bars denote 95% confidence intervals for the odds ratio

**Fig. 3**
X-CAP outperforms competitors. a For each model, we plot the ROC curve and associated AUROC metric on the test set of $D_{original}$ . X-CAP has the highest AUROC, improving upon the previous state-of-the-art by 0.14 absolute points. The orange and green dotted lines display X-CAP’s performance when trained only on variants present in the databases used by MutPred-LoF and ALoFT, respectively. To ensure a fair comparison, we randomly subsampled these datasets to the size used in the original papers (n indicates the size of the training set). b We enlarge the portion of the plot above the dashed line in panel a to show performance in the clinically relevant, high-sensitivity region (TPR ≥0.95). We also display the hsr-AUROC, which is the normalized area under the curve in the high-sensitivity region. We optimized X-CAP to excel in this region, rather than over the full ROC. At 95% sensitivity, X-CAP correctly classifies 80.0% of benign stopgain variants, over four times more than any other classifier

**Fig. 4**
X-CAP eliminates the most benign stopgain VUS in control exomes. We plot the fraction of rare benign stopgain variants that were assigned scores below the 95%-sensitivity threshold for each classifier. These variants were taken from exomes from a control population (N=480) in an Inflammatory Bowel Disease (IBD) study. The performance of all classifiers on exomes nicely matches their performance on aggregated variant sets in Fig. 3b and Additional file 1: Fig. S3b. X-CAP increases the percentage of benign VUS eliminated by 4.4-fold

See this image and copyright information in PMC

References

1. Bamshad MJ, Ng SB, Bigham AW, Tabor HK, Emond MJ, Nickerson DA, Shendure J. Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet. 2011;12(11):745–55. doi: 10.1038/nrg3031. - DOI - PubMed
1. Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Jang W, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):1062–7. doi: 10.1093/nar/gkx1153. - DOI - PMC - PubMed
1. Stenson PD, Mort M, Ball EV, Chapman M, Evans K, Azevedo L, Hayden M, Heywood S, Millar DS, Phillips AD, et al. The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting. Hum Genet. 2020;139(10):1197–207. doi: 10.1007/s00439-020-02199-3. - DOI - PMC - PubMed
1. Won D-G, Kim D-W, Woo J, Lee K. 3Cnet: pathogenicity prediction of human variants using multitask learning with evolutionary constraints. Bioinformatics. 2021;37(24):4626–34. doi: 10.1093/bioinformatics/btab529. - DOI - PMC - PubMed
1. Wenger AM, Guturu H, Bernstein JA, Bejerano G. Systematic reanalysis of clinical exome data yields additional diagnoses: implications for providers. Genet Med. 2017;19(2):209–14. doi: 10.1038/gim.2016.88. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

X-CAP improves pathogenicity prediction of stopgain variants

Affiliations

X-CAP improves pathogenicity prediction of stopgain variants

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous