Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jun;32(6):661-8.
doi: 10.1002/humu.21490. Epub 2011 Apr 7.

Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed

Affiliations

Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed

Stephanie Hicks et al. Hum Mutat. 2011 Jun.

Abstract

Multiple algorithms are used to predict the impact of missense mutations on protein structure and function using algorithm-generated sequence alignments or manually curated alignments. We compared the accuracy with native alignment of SIFT, Align-GVGD, PolyPhen-2, and Xvar when generating functionality predictions of well-characterized missense mutations (n = 267) within the BRCA1, MSH2, MLH1, and TP53 genes. We also evaluated the impact of the alignment employed on predictions from these algorithms (except Xvar) when supplied the same four alignments including alignments automatically generated by (1) SIFT, (2) Polyphen-2, (3) Uniprot, and (4) a manually curated alignment tuned for Align-GVGD. Alignments differ in sequence composition and evolutionary depth. Data-based receiver operating characteristic curves employing the native alignment for each algorithm result in area under the curve of 78-79% for all four algorithms. Predictions from the PolyPhen-2 algorithm were least dependent on the alignment employed. In contrast, Align-GVGD predicts all variants neutral when provided alignments with a large number of sequences. Of note, algorithms make different predictions of variants even when provided the same alignment and do not necessarily perform best using their own alignment. Thus, researchers should consider optimizing both the algorithm and sequence alignment employed in missense prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A) Predictions of neutral (n = 16) BRCA1 missense mutations using three algorithms with four alignments each. The four alignments are represented by SIFT (SIFT), Align-GVGD (A-GVGD), PolyPhen-2 (PPH2), and Uniprot 50% (Uniprot 50%). The prediction categories for PolyPhen-2 Possibly Damaging and Probably Damaging have been abbreviated to ‘Possibly D’ and ‘Probably D’. The algorithm Xvar employs its own alignment. B) Predictions of deleterious (n = 17) BRCA1 missense mutations using three algorithms with four alignments each.
Figure 2
Figure 2
Boxplots of specificity (spec) and sensitivity (sens) for each algorithm as given in Table 1. Sensitivity values are reported using all four genes BRCA1, MSH2, MLH1 and TP53, but TP53 is excluded in specificity values to account for potential bias given that there are only 4 neutral variants. The three algorithms are represented by SIFT (SIFT), Align-GVGD (A-GVGD) and PolyPhen-2 (PPH2).
Figure 3
Figure 3
A) Receiver operating characteristic (ROC) curves using probabilities and scores associated with each prediction for each of the three algorithms SIFT, Align-GVGD, and PolyPhen-2. For each algorithm, four colored lines (black, red, green blue) are drawn representing the four alignments used in each algorithm. The area under the curve (AUC) is reported in the legend. B) Receiver operating characteristic (ROC) curve using the four genes BRCA1, MSH2, MLH1, and TP53 from the Xvar algorithm. The pink line drawn represents the Xvar alignment. C) ROC curves comparing the performance of the four algorithms using their own native alignments.
Figure 4
Figure 4
Predictions of neutral and deleterious mutations with the SIFT, Align-GVGD and PolyPhen-2 algorithms using the Align-GVGD alignment and the Xvar algorithm using its own alignment. We also depict the exclusive overlap of the predictions between the four algorithms to show their agreement (dark yellow) and between the three algorithms SIFT, Align-GVGD and PolyPhen-2 (light yellow).

Comment in

References

    1. Abkevich V, Zharkikh A, Deffenbaugh AM, Frank D, Chen Y, Shattuck D, Skolnick MH, Gutin A, Tavtigian SV. Analysis of missense variation in human BRCA1 in the context of interspecific sequence variation. J Med Genet. 2004;41:492–507. - PMC - PubMed
    1. Abramowitz M, Stegun IA. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. U.S. Government Printing Office; Washington, D.C.: 1972. p. 885.
    1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev S. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–249. - PMC - PubMed
    1. Agresti A. Categorical Data Analysis. 2nd edition. John Wiley and Sons; Hoboken, New Jersey: 2002.
    1. Balasubramanian S, Xia Y, Freinkman E, Gerstein M. Sequence variation in G-protein-coupled receptors: analysis of single nucleotide polymorphisms. Nucl Acids Res. 2005;33:1710–1721. - PMC - PubMed

Publication types

MeSH terms