Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Mar;3(2):99-110.
doi: 10.1002/mgg3.116. Epub 2014 Dec 3.

Variability in pathogenicity prediction programs: impact on clinical diagnostics

Affiliations

Variability in pathogenicity prediction programs: impact on clinical diagnostics

Lauren C Walters-Sen et al. Mol Genet Genomic Med. 2015 Mar.

Abstract

Current practice by clinical diagnostic laboratories is to utilize online prediction programs to help determine the significance of novel variants in a given gene sequence. However, these programs vary widely in their methods and ability to correctly predict the pathogenicity of a given sequence change. The performance of 17 publicly available pathogenicity prediction programs was assayed using a dataset consisting of 122 credibly pathogenic and benign variants in genes associated with the RASopathy family of disorders and limb-girdle muscular dystrophy. Performance metrics were compared between the programs to determine the most accurate program for loss-of-function and gain-of-function mechanisms. No one program correctly predicted the pathogenicity of all variants analyzed. A major hindrance to the analysis was the lack of output from a significant portion of the programs. The best performer was MutPred, which had a weighted accuracy of 82.6% in the full dataset. Surprisingly, combining the results of the top three programs did not increase the ability to predict pathogenicity over the top performer alone. As the increasing number of sequence changes in larger datasets will require interpretation, the current study demonstrates that extreme caution must be taken when reporting pathogenicity based on statistical online protein prediction programs in the absence of functional studies.

Keywords: Diagnostics; pathogenicity; prediction; sequencing; variants.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Scheme for the selection of variants used in this study. Functional studies are the gold standard by which to establish the disease association (pathogenic) or normal variation (benign) status of any sequence variant. Unfortunately, functional studies have only been preformed for a small portion of identified variants. To ensure the greatest likelihood that we were using pathogenic and benign variants to examine this series of prediction programs, we used this selection criteria to establish our dataset. Thirty-six additional benign variants were extracted from Ensembl as few had been identified by clinical sequencing analysis. For detailed descriptions of criteria, please see the Materials and Methods section.
Figure 2
Figure 2
Percentage of correct predictions. The ability of the prediction programs to correctly assign either pathogenic (black) or benign (white) status to variants in the RASopathy dataset (A) and the LGMD dataset (B) is shown. The program used and the number of variants with prediction outputs (pathogenic, benign) are listed below the graph. Percentages were generated by dividing the number of variants predicted correctly by the number of variants with prediction outputs for each class (pathogenic or benign). The RASopathy dataset contained 35 credibly pathogenic variants and 19 credibly benign variants. The LGMD dataset contained 36 credibly pathogenic variants and 32 credibly benign variants.
Figure 3
Figure 3
Accuracy of prediction programs in the RASopathy and LGMD datasets. Both accuracy (white) and weighted accuracy (black) are shown for the prediction programs analyzed. The number of variants with usable prediction calls are listed for each individual program. (A) Gain-of-function RASopathy variants (n = 54); (B) Loss-of-function LGMD variants (n = 68).
Figure 4
Figure 4
Performance of the combined program algorithm. The number of variants in each category (bars) and the percentage of correct predictions (line) are shown for each dataset when using the combined MutPred/Condel/FATHMM-Weighted method. RASopathy, n = 53; LGMD, n = 68; Combined, n = 121.

References

    1. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat. Methods. 2010;7:248–249. - PMC - PubMed
    1. Bao L, Zhou M. Cui Y. nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms. Nucleic Acids Res. 2005;33:W480–W482. - PMC - PubMed
    1. Bromberg Y. Rost B. SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res. 2007;35:3823–3835. - PMC - PubMed
    1. Calabrese R, Capriotti E, Fariselli P, Martelli PL. Casadio R. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum. Mutat. 2009;30:1237–1244. - PubMed
    1. Capriotti E, Calabrese R. Casadio R. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics. 2006;22:2729–2734. - PubMed

LinkOut - more resources