Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 20;11(1):5918.
doi: 10.1038/s41467-020-19669-x.

Inferring the molecular and phenotypic impact of amino acid variants with MutPred2

Affiliations

Inferring the molecular and phenotypic impact of amino acid variants with MutPred2

Vikas Pejaver et al. Nat Commun. .

Abstract

Identifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing interests. D.N.C. and M.M. acknowledge Qiagen Inc. for their financial support through a License Agreement with Cardiff University. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. MutPred2 and the molecular consequences of amino acid substitutions.
a The human tumor suppressor p53 as an illustration of the numerous possible effects of amino acid substitutions on protein structure and function. Protein Data Bank IDs for the structures shown are 1TUP, 1YCS, 2J1W, and 2YBG. b The ontology constructed in this study to organize the possible structural and functional effects of amino acid substitutions. It is confined to the 53 properties included in MutPred2. c The MutPred2 workflow. For a given amino acid sequence and substitution, MutPred2 first extracts six categories of features. Changes in structure and function due to the substitution are also modeled by running the original and mutated sequences through different sequence-based protein property predictors. Two scores are obtained for each property and these are combined to generate two additional scores quantifying the loss and gain of the property in question. All four scores are included as features. Next, all categories of features are presented to an ensemble of 30 neural networks trained to distinguish between pathogenic and benign variants. MutPred2 returns two outputs, the general score and the property score. The general score is obtained from the neural network ensemble and indicates the pathogenicity of the given variant. It ranges between 0 and 1, with a higher score indicating a greater propensity to be pathogenic. The property score is assigned to each of the 53 properties for the given variant and also ranges between 0 and 1. The latter score is the posterior probability of loss or gain (whichever is greater) of the given property due to the substitution. The higher the property score, the more likely that the molecular mechanism of the disease involves the alteration of the property.
Fig. 2
Fig. 2. Performance and interpretability of MutPred2.
a ROC curves obtained through tenfold cross-validation on the MutPred2 training set. The main model represents MutPred2 in the default setting (with real conservation scores and homolog count profiles). All lines are paired with the solid line representing the model with homolog count profiles and the dashed line, representing the model without the profiles. b ROC curves on an independent test set, obtained from ClinVar and SwissVar by letting the data accumulate in these databases for 3 years. MutationTaster2 only returns a value of zero or one and therefore its performance is plotted as a single point (X). Since some tools could not assign scores to all variants, results from the subset of the variants (285 pathogenic and 107 benign) that are covered by all methods are shown. Detailed performance measures on this subset and a less stringent set (filtered at 80% sequence identity) are shown in Supplementary Tables 10 and 11. c Mean score distributions for MutPred2, PolyPhen-2 HumDiv, PolyPhen-2 HumVar, SIFT, FATHMM, and CADD applied to ten randomly selected exomes from the 1000 Genomes Project. Error bars represent the standard errors of the means, estimated by dividing the standard deviation in each bin by the square root of 10. All heterozygous and homozygous variants were plotted in separate panels. The mean in each panel represents the average number of variants found in an individual for the given category.
Fig. 3
Fig. 3. Significantly enriched and depleted pathogenic mechanisms predicted by MutPred2.
The data set consisted of 53,180 inherited disease mutations and an unlabeled set of 205,303 variants. Losses and gains are plotted together by considering the maximal effect for a given mutation position. An asterisk indicates significance at the 0.05 level with Benjamini–Hochberg correction, as computed by a one-sided Fisher’s exact test.
Fig. 4
Fig. 4. Summary of MutPred2 predictions on de novo missense mutations from four neurodevelopmental disorders.
After removing genes with mutations shared by both case and control sets, subsequent analyses were based on 2986 and 844 mutations from the case and control sets, respectively. a Proportions of case and control mutations predicted to be pathogenic by MutPred2 at thresholds corresponding to false-positive rates (FPR) of 10% and 5%, respectively. P values were computed using a two-tailed Fisher’s exact test. Odds ratios and P values for other thresholds are shown in Supplementary Fig. 7. b Enrichment of structural and functional signatures of case mutations versus the control group. Only those mutations considered to be pathogenic at the 5% FPR threshold were included in this analysis. Properties are grouped based upon their broader classes as described in the ontology (Fig. 1b). Statistical significance was assigned at α = 0.05 using a one-sided binomial test with Benjamini–Hochberg correction and represented by asterisks. c Representative images of 3AT selection plates with the interaction profiles of STXBP1 against TRIM38, STX11, and STX5. Pathogenicity scores and probability of alteration of protein–protein interactions (PPIs) corresponding to each mutation are shown. The PPI alteration probability of >0.5 × (1 – 0.5) = 0.25 is considered to be high scoring.

References

    1. Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat. Rev. Genet. 2011;12:628–640. doi: 10.1038/nrg3046. - DOI - PubMed
    1. Peterson TA, Doughty E, Kann MG. Towards precision medicine: advances in computational approaches for the analysis of human variants. J. Mol. Biol. 2013;425:4047–4063. doi: 10.1016/j.jmb.2013.08.008. - DOI - PMC - PubMed
    1. Niroula A, Vihinen M. Variation interpretation predictors: principles, types, performance, and choice. Hum. Mutat. 2016;37:579–597. doi: 10.1002/humu.22987. - DOI - PubMed
    1. Kumar S, Sanderford M, Gray VE, Ye J, Liu L. Evolutionary diagnosis method for variants in personal exomes. Nat., Methods. 2012;9:855–856. doi: 10.1038/nmeth.2147. - DOI - PMC - PubMed
    1. Miosge LA, et al. Comparison of predicted and actual consequences of missense mutations. Proc. Natl Acad. Sci. USA. 2015;112:E5189–E5198. doi: 10.1073/pnas.1511585112. - DOI - PMC - PubMed

Publication types