. 2015;16 Suppl 8(Suppl 8):S1.

doi: 10.1186/1471-2164-16-S8-S1. Epub 2015 Jun 18.

Better prediction of functional effects for sequence variants

Maximilian Hecht, Yana Bromberg, Burkhard Rost

PMID: 26110438
PMCID: PMC4480835
DOI: 10.1186/1471-2164-16-S8-S1

Better prediction of functional effects for sequence variants

Maximilian Hecht et al. BMC Genomics. 2015.

. 2015;16 Suppl 8(Suppl 8):S1.

doi: 10.1186/1471-2164-16-S8-S1. Epub 2015 Jun 18.

Authors

Maximilian Hecht, Yana Bromberg, Burkhard Rost

PMID: 26110438
PMCID: PMC4480835
DOI: 10.1186/1471-2164-16-S8-S1

Abstract

Elucidating the effects of naturally occurring genetic variation is one of the major challenges for personalized health and personalized medicine. Here, we introduce SNAP2, a novel neural network based classifier that improves over the state-of-the-art in distinguishing between effect and neutral variants. Our method's improved performance results from screening many potentially relevant protein features and from refining our development data sets. Cross-validated on >100k experimentally annotated variants, SNAP2 significantly outperformed other methods, attaining a two-state accuracy (effect/neutral) of 83%. SNAP2 also outperformed combinations of other methods. Performance increased for human variants but much more so for other organisms. Our method's carefully calibrated reliability index informs selection of variants for experimental follow up, with the most strongly predicted half of all effect variants predicted at over 96% accuracy. As expected, the evolutionary information from automatically generated multiple sequence alignments gave the strongest signal for the prediction. However, we also optimized our new method to perform surprisingly well even without alignments. This feature reduces prediction runtime by over two orders of magnitude, enables cross-genome comparisons, and renders our new method as the best solution for the 10-20% of sequence orphans. SNAP2 is available at: https://rostlab.org/services/snap2web.

PubMed Disclaimer

Figures

**Figure 3**
**SNAP2 and PolyPhen-2 are best for difficult human variants**. Bars mark the two-state accuracy (Q2; Eqn. 4) at the default thresholds for SNAP2 (dark blue), SNAP (light blue), SIFT (green), and PolyPhen-2 (orange). Random prediction performance assuming 60:40 effect:neutral background are given in pink. Analysis is based on 3,963 'difficult' cases (2,589 effect; 1,374 neutral) from *PMD_HUMAN* set. Difficult cases were defined as variants where any of the above method's predictions disagreed; *i.e*. cases where not all methods, excluding random, gave the same prediction.

**Figure 4**
**SNAP2 threshold and reliability**. The reliability index provides a means of focusing on the most accurate predictions. Panel **(a)** shows SNAP2 performance on the balanced PMD/EC data set over the entire spectrum of accuracy (solid lines) and coverage (dotted lines) for both effect (red) and neutral (green) variants depending on the chosen threshold (x-axis). The default threshold was set to -0.05, where neutral and effect predictions performed alike (black arrow). By moving the decision threshold users can optimize predictive behavior towards their research needs: predictions at higher absolute scores (*e.g*. TP>0.5 or TN<-0.5) are much more likely correct but they are not available for all variants. Panel **(b)** directly relates the reliability index (RI) to the performance on our data. Shown is the cumulative percentage of predictions (x-axis) against accuracy (solid lines) and coverage (dotted lines) above a given reliability index (RI; Methods). Accuracy and coverage are shown separately for neutral (green) and effect (red) predictions. Each marker depicts a reliability threshold ranging from 0 (right most marker, low reliability) to 9 (left most marker, high reliability). Labels for RI >= 2, 4 and, 6 are skipped for simplicity. For instance, 58% of all predictions in our cross-validation were made at reliability levels of 7 or higher (gray arrows). At this reliability, 95% of all effect predictions and 90% of all neutral predictions were correct.

See this image and copyright information in PMC

Cited by

Variant predictions in congenital adrenal hyperplasia caused by mutations in CYP21A2.
Prado MJ, Ligabue-Braun R, Zaha A, Rossetti MLR, Pandey AV. Prado MJ, et al. Front Pharmacol. 2022 Oct 5;13:931089. doi: 10.3389/fphar.2022.931089. eCollection 2022. Front Pharmacol. 2022. PMID: 36278220 Free PMC article.
HGDiscovery: An online tool providing functional and phenotypic information on novel variants of homogentisate 1,2- dioxigenase.
Karmakar M, Cicaloni V, Rodrigues CHM, Spiga O, Santucci A, Ascher DB. Karmakar M, et al. Curr Res Struct Biol. 2022 Aug 30;4:271-277. doi: 10.1016/j.crstbi.2022.08.001. eCollection 2022. Curr Res Struct Biol. 2022. PMID: 36118553 Free PMC article.
Tools for Predicting the Functional Impact of Nonsynonymous Genetic Variation.
Tang H, Thomas PD. Tang H, et al. Genetics. 2016 Jun;203(2):635-47. doi: 10.1534/genetics.116.190033. Genetics. 2016. PMID: 27270698 Free PMC article. Review.
Protein function in precision medicine: deep understanding with machine learning.
Rost B, Radivojac P, Bromberg Y. Rost B, et al. FEBS Lett. 2016 Aug;590(15):2327-41. doi: 10.1002/1873-3468.12307. Epub 2016 Aug 6. FEBS Lett. 2016. PMID: 27423136 Free PMC article. Review.
A New Case of Autosomal-Dominant POLR3B-Related Disorder: Widening Genotypic and Phenotypic Spectrum.
Colona VL, Bertini E, Digilio MC, D'Amico A, Novelli A, Pro S, Pisaneschi E, Nicita F. Colona VL, et al. Brain Sci. 2023 Nov 8;13(11):1567. doi: 10.3390/brainsci13111567. Brain Sci. 2023. PMID: 38002527 Free PMC article.

See all "Cited by" articles

References

1. Zuckerkandl E, Pauling L. Molecules as documents of evolutionary history. Journal of Theoretical Biology. 1965;8:357–366. doi: 10.1016/0022-5193(65)90083-4. - DOI - PubMed
1. Schwarz JM, Rodelsperger C, Schuelke M, Seelow D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods. 2010;7(8):575–576. doi: 10.1038/nmeth0810-575. - DOI - PubMed
1. Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6(2):80–92. doi: 10.4161/fly.19695. - DOI - PMC - PubMed
1. McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010;26(16):2069–2070. doi: 10.1093/bioinformatics/btq330. - DOI - PMC - PubMed
1. Schaefer C, Rost B. Predict impact of single amino acid change upon protein structure. BMC Genomics. 2012;13(Suppl 4):S4. doi: 10.1186/1471-2164-13-S4-S4. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Better prediction of functional effects for sequence variants

Better prediction of functional effects for sequence variants

Authors

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Research Materials