Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations

Vanessa E Gray¹, Kimberly R Kukurba, Sudhir Kumar

Affiliations

PMID: 22685075
PMCID: PMC3413386
DOI: 10.1093/bioinformatics/bts336

Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations

Vanessa E Gray et al. Bioinformatics. 2012.

. 2012 Aug 15;28(16):2093-6.

doi: 10.1093/bioinformatics/bts336. Epub 2012 Jun 8.

Authors

Vanessa E Gray¹, Kimberly R Kukurba, Sudhir Kumar

Affiliation

¹ Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University, Tempe, AZ 85287, USA.

PMID: 22685075
PMCID: PMC3413386
DOI: 10.1093/bioinformatics/bts336

Abstract

Site-directed mutagenesis is frequently used by scientists to investigate the functional impact of amino acid mutations in the laboratory. Over 10,000 such laboratory-induced mutations have been reported in the UniProt database along with the outcomes of functional assays. Here, we explore the performance of state-of-the-art computational tools (Condel, PolyPhen-2 and SIFT) in correctly annotating the function-altering potential of 10,913 laboratory-induced mutations from 2372 proteins. We find that computational tools are very successful in diagnosing laboratory-induced mutations that elicit significant functional change in the laboratory (up to 92% accuracy). But, these tools consistently fail in correctly annotating laboratory-induced mutations that show no functional impact in the laboratory assays. Therefore, the overall accuracy of computational tools for laboratory-induced mutations is much lower than that observed for the naturally occurring human variants. We tested and rejected the possibilities that the preponderance of changes to alanine and the presence of multiple base-pair mutations in the laboratory were the reasons for the observed discordance between the performance of computational tools for natural and laboratory mutations. Instead, we discover that the laboratory-induced mutations occur predominately at the highly conserved positions in proteins, where the computational tools have the lowest accuracy of correct prediction for variants that do not impact function (neutral). Therefore, the comparisons of experimental-profiling results with those from computational predictions need to be sensitive to the evolutionary conservation of the positions harboring the amino acid change.

PubMed Disclaimer

Figures

**Fig. 1**
A frequency distribution showing the number of different mutants explored in the laboratory for 2372 human proteins, as reported in the UniProt database. The median (mean) number of mutations analyzed is 3 (4.6)

**Fig. 2**
The distribution of long-term evolutionary rates (r) of positions containing 10 913 laboratory mutations analyzed in this study

**Fig. 3**
Accuracy of computational tools in predicting the functional impact of laboratory-induced mutations that alter the protein function. (A) Proportion of mutations correctly diagnosed to be non-neutral by Condel, PolyPhen-2 and SIFT. The results are shown for all lab-damaging mutations (filled bars) and only those damaging mutations that abolish the protein function (open bars). (B) The cumulative frequency distribution of lab-damaging mutations that abolish (open squares) and do not abolish function (gray squares) at various levels of deleteriousness as measured by their Condel scores

**Fig. 4**
The accuracies of computational tools and evolutionary properties of mutations. (A) Fraction of lab-neutral mutations predicted correctly by Condel, PolyPhen-2 and SIFT. (B) Observed-to-expected ratios of the lab-neutral mutations (closed circles, solid line) and human population variants (open circles, dashed line) in three evolutionary conservation categories. (C) Histogram of evolutionary rates of lab-neutral mutations that were diagnosed correctly (black bars) and incorrectly (white bars) by PolyPhen-2; similar results are observed for Condel and SIFT. For panel B, 456,426 protein variants were obtained from the 1000 Genomes Project (Consortium, 2010). Relative proportions of positions in each category was estimated by considering evolutionary rates of all amino acid positions found in proteins containing at least lab-induced mutation or population variant, as appropriate. These relative proportions were then used to generate expected numbers of lab-neutral mutations in each category

See this image and copyright information in PMC

References

1. Adzhubei I., et al. A method and server for predicting damaging missense mutations. Nat. Methods. 2010;7:248–249. - PMC - PubMed
1. Bromberg Y., Rost B. Comprehensive in silico mutagenesis highlights functionally important residues in proteins. Bioinformatics. 2008;24:i207–i212. - PMC - PubMed
1. Consortium G.P. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
1. Di Y.M., et al. Prediction of deleterious non-synonymous single-nucleotide polymorphisms of human uridine diphosphate glucuronosyltransferase genes. AAPS J. 2009;11:469–480. - PMC - PubMed
1. González-Pérez A., López-Bigas N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score. Condel. Am. J. Hum. Genet. 2011;88:440–449. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations

Affiliation

Performance of computational tools in evaluating the functional impact of laboratory-induced amino acid mutations

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources