. 2016 May 25;12(5):e1004962.

doi: 10.1371/journal.pcbi.1004962. eCollection 2016 May.

PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions

Jaroslav Bendl^{1

2

3}, Miloš Musil^{1

2}, Jan Štourač^{1

3}, Jaroslav Zendulka², Jiří Damborský^{1

3}, Jan Brezovský^{1

3}

Affiliations

¹ Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, Brno, Czech Republic.
² Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.
³ International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic.

PMID: 27224906
PMCID: PMC4880439
DOI: 10.1371/journal.pcbi.1004962

PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions

Jaroslav Bendl et al. PLoS Comput Biol. 2016.

. 2016 May 25;12(5):e1004962.

doi: 10.1371/journal.pcbi.1004962. eCollection 2016 May.

Authors

Jaroslav Bendl^{1

2

3}, Miloš Musil^{1

2}, Jan Štourač^{1

3}, Jaroslav Zendulka², Jiří Damborský^{1

3}, Jan Brezovský^{1

3}

Affiliations

¹ Loschmidt Laboratories, Department of Experimental Biology and Research Centre for Toxic Compounds in the Environment RECETOX, Masaryk University, Brno, Czech Republic.
² Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic.
³ International Clinical Research Center, St. Anne's University Hospital Brno, Brno, Czech Republic.

PMID: 27224906
PMCID: PMC4880439
DOI: 10.1371/journal.pcbi.1004962

Abstract

An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To enable comprehensive evaluation of variants, the predictions are complemented with annotations from eight databases. The web server is freely available to the community at http://loschmidt.chemi.muni.cz/predictsnp2.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Categorization of variants based on their location within the genome and their type.**

**Fig 2. Workflow diagram describing the construction of the dataset of variants related to Mendelian diseases.**
The dataset was prepared by combining deleterious variants from the ClinVar database with neutral variants from the VariSNP database. The resulting dataset was then divided into independent training and testing subsets for each individual category of variants.

Fig 3. The use of category-optimal thresholds improves the predictive performance of individual tools by increasing their ability to capture differences in the distribution of prediction scores for the different categories of variants.
(A) Distribution of scores for deleterious and neutral variants provided by each evaluated tool for individual categories of variants from the training subsets of the Mendelian diseases dataset. The locations of the general and category-optimal thresholds used to obtain predictions are shown for each tool. (B) Normalized accuracies achieved by individual tools when using category-optimal (blue bars) and general (red bars) thresholds, evaluated using testing subsets of the Mendelian diseases dataset.

**Fig 4. Performance of nucleotide-based and protein-based prediction tools and their consensuses, evaluated using the dataset of variants associated with Mendelian diseases.**
(A) Observed normalized accuracy and (B) area under the receiver operating characteristic curve (AUC) values are shown as blue and red bars for nucleotide- and protein-based tools and their consensuses, respectively. The horizontal dashed lines represent average performance values for each tool type.

**Fig 5. Workflow diagram of the PredictSNP2 webserver.**
Upon submission of input variants, evaluation is performed with the integrated prediction tools. The raw scores produced by individual tools are transformed into overall decisions about deleteriousness and interpretable confidence scores according to the category of variants detected by ANNOVAR. In addition, links to relevant databases and on-line tools are provided to allow the user to better understand the genomic context and potential function of the corresponding genome region. Optionally, evaluation of missense mutations by PredictSNP1 can be requested.

**Fig 6. The graphical user interface of the PredictSNP2 webserver.**
(A) On the input page, variants to be analyzed can be provided in several established formats using one of two reference genome assemblies. (B) On the output page, the predictions of individual tools and their PredictSNP2 consensus score are reported together with links to the eight relevant databases.

See this image and copyright information in PMC

References

1. Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet. 2011;12: 628–640. 10.1038/nrg3046 - DOI - PubMed
1. Mirnezami R, Nicholson J, Darzi A. Preparing for precision medicine. N Engl J Med. 2012;366: 489–491. 10.1056/NEJMp1114866 - DOI - PubMed
1. Schork NJ. Personalized medicine: Time for one-person trials. Nature. 2015;520: 609–611. 10.1038/520609a - DOI - PubMed
1. Capriotti E, Nehrt NL, Kann MG, Bromberg Y. Bioinformatics for personal genome interpretation. Brief Bioinform. 2012;13: 495–512. - PMC - PubMed
1. Blair DR, Lyttle CS, Mortensen JM, Bearden CF, Jensen AB, Khiabanian H, et al. A nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease risk. Cell. 2013;155: 70–80. 10.1016/j.cell.2013.08.030 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions

Affiliations

PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases