. 2020 Mar 17;21(1):107.

doi: 10.1186/s12859-020-3439-4.

Variant effect predictions capture some aspects of deep mutational scanning experiments

Jonas Reeb¹, Theresa Wirth², Burkhard Rost^{2

3

4

5}

Affiliations

¹ Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr 3, 85748, Garching/Munich, Germany. reeb@rostlab.org.
² Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr 3, 85748, Garching/Munich, Germany.
³ Institute for Advanced Study (TUM-IAS), Lichtenbergstr 2a, 85748, Garching/Munich, Germany.
⁴ TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany.
⁵ Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA.

PMID: 32183714
PMCID: PMC7077003
DOI: 10.1186/s12859-020-3439-4

Variant effect predictions capture some aspects of deep mutational scanning experiments

Jonas Reeb et al. BMC Bioinformatics. 2020.

. 2020 Mar 17;21(1):107.

doi: 10.1186/s12859-020-3439-4.

Authors

Jonas Reeb¹, Theresa Wirth², Burkhard Rost^{2

3

4

5}

Affiliations

¹ Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr 3, 85748, Garching/Munich, Germany. reeb@rostlab.org.
² Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr 3, 85748, Garching/Munich, Germany.
³ Institute for Advanced Study (TUM-IAS), Lichtenbergstr 2a, 85748, Garching/Munich, Germany.
⁴ TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany.
⁵ Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA.

PMID: 32183714
PMCID: PMC7077003
DOI: 10.1186/s12859-020-3439-4

Abstract

Background: Deep mutational scanning (DMS) studies exploit the mutational landscape of sequence variation by systematically and comprehensively assaying the effect of single amino acid variants (SAVs; also referred to as missense mutations, or non-synonymous Single Nucleotide Variants - missense SNVs or nsSNVs) for particular proteins. We assembled SAV annotations from 22 different DMS experiments and normalized the effect scores to evaluate variant effect prediction methods. Three trained on traditional variant effect data (PolyPhen-2, SIFT, SNAP2), a regression method optimized on DMS data (Envision), and a naïve prediction using conservation information from homologs.

Results: On a set of 32,981 SAVs, all methods captured some aspects of the experimental effect scores, albeit not the same. Traditional methods such as SNAP2 correlated slightly more with measurements and better classified binary states (effect or neutral). Envision appeared to better estimate the precise degree of effect. Most surprising was that the simple naïve conservation approach using PSI-BLAST in many cases outperformed other methods. All methods captured beneficial effects (gain-of-function) significantly worse than deleterious (loss-of-function). For the few proteins with multiple independent experimental measurements, experiments differed substantially, but agreed more with each other than with predictions.

Conclusions: DMS provides a new powerful experimental means of understanding the dynamics of the protein sequence space. As always, promising new beginnings have to overcome challenges. While our results demonstrated that DMS will be crucial to improve variant effect prediction methods, data diversity hindered simplification and generalization.

Keywords: Deep mutational scanning; Missense variant; Non-synonymous sequence variant; Sequence variation; Single nucleotide variant; Variant effect prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
DMS experiments vs. variant effect predictions. In a hexbin plot, 17,781 deleterious effect SAVs in *SetCommon* were compared to normalized scores for three prediction methods (SNAP2 [38], Envision [49], and Naïve Conservation). Values on both axes range from 0 (neutral) to 1 (maximal effect) as denoted by the gradient from white (neutral) to red (effect). Dashed red lines give linear least-squared regressions. Marginals denote distributions of experimental and predicted scores with a kernel density estimation overlaid in blue. The footer denotes Spearman ρ, Pearson R and the mean squared error together with the respective 95% confidence intervals. The method scores are given on the y-axes and reveal the method: a SNAP2, b Envision – the only method trained on DMS data, c Naïve Conservation read off PSI-BLAST profiles

**Fig. 2**
Recall proportional to deleterious DMS effect scores. The continuous normalized DMS scores with deleterious effect in *SetCommon* were split into 20 bins of equal size. a In each bin the fraction of SAVs predicted as having an effect by the binary classification methods (PolyPhen-2 [37], SIFT [39] and SNAP2 [38]) was shown. Naïve Conservation read off PSI-BLAST profiles was treated as an effect prediction when scores were above 0. For all other methods the default score thresholds were applied. b shows the values adjusted for the amount of effect predicted in the first bin

**Fig. 3**
Experimental agreement vs. predictions. For every pair of experimental measurements on the same protein (Table S1), the agreement between two experiments and that between each experiment and the predictions of SNAP2 and Naïve Conservation are compared. a ∆ρ = 0.5*(ρ(× 1,p1) + ρ(× 2,p2)) - ρ(× 1,× 2), (b) ∆MSE = MSE(× 1,× 2) - 0.5*(MSE(× 1,p1) + MSE(× 2,p2)). Where × 1/× 2 are the experiments and p1/p2 the predictions on the two experiments, all of which are calculated based on the largest possible set of SAVs. Negative values on the y-axes thus imply that the agreement between experiments is higher than that between experiment and prediction, positive values that predictions agree more

**Fig. 4**
Classification performance of all prediction methods. Shown are ROC curves for 13,796 deleterious effect SAVs which were classified into either neutral, defined by the middle 95% of the scores from synonymous variants, or effect (*SetCommonSyn95*). Shaded areas around lines denote 95% confidence intervals. The legend denotes the AUC for each of the five prediction methods, along with the 95% confidence intervals. Horizontal dashed lines denote the default score threshold used by SNAP2 (blue) and SIFT (green)

**Fig. 5**
Concept of analysis. Experimental scores of variant effects (missense mutations, or single amino acid variants, labelled SAVs) from Deep Mutational Scanning (DMS) experiments were compared to in silico prediction methods. *Envision* was the only method developed on DMS data; it provides continuous scores mirroring the DMS data. SIFT, PolyPhen-2 can be evaluated as binary classification methods. SNAP2 is a classification method but provides continuous scores that can also be used. Naïve Conservation is provided as a baseline for both cases

See this image and copyright information in PMC

Cited by

VariBench, new variation benchmark categories and data sets.
Shirvanizadeh N, Vihinen M. Shirvanizadeh N, et al. Front Bioinform. 2023 Sep 19;3:1248732. doi: 10.3389/fbinf.2023.1248732. eCollection 2023. Front Bioinform. 2023. PMID: 37795169 Free PMC article. No abstract available.
Variant effect predictor correlation with functional assays is reflective of clinical classification performance.
Livesey BJ, Marsh JA. Livesey BJ, et al. Genome Biol. 2025 Apr 22;26(1):104. doi: 10.1186/s13059-025-03575-w. Genome Biol. 2025. PMID: 40264194 Free PMC article.
Globally defining the effects of mutations in a picornavirus capsid.
Mattenberger F, Latorre V, Tirosh O, Stern A, Geller R. Mattenberger F, et al. Elife. 2021 Jan 12;10:e64256. doi: 10.7554/eLife.64256. Elife. 2021. PMID: 33432927 Free PMC article.
Machine Learning-Guided Protein Engineering.
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Kouba P, et al. ACS Catal. 2023 Oct 13;13(21):13863-13895. doi: 10.1021/acscatal.3c02743. eCollection 2023 Nov 3. ACS Catal. 2023. PMID: 37942269 Free PMC article. Review.
Embeddings from protein language models predict conservation and variant effects.
Marquet C, Heinzinger M, Olenyi T, Dallago C, Erckert K, Bernhofer M, Nechaev D, Rost B. Marquet C, et al. Hum Genet. 2022 Oct;141(10):1629-1647. doi: 10.1007/s00439-021-02411-y. Epub 2021 Dec 30. Hum Genet. 2022. PMID: 34967936 Free PMC article.

See all "Cited by" articles

References

1. Tennessen Jacob A., Bigham Abigail W., O’Connor Timothy D., Fu Wenqing, Kenny Eimear E., Gravel Simon, McGee Sean, Do Ron, Liu Xiaoming, Jun Goo, Kang Hyun Min, Jordan Daniel, Leal Suzanne M., Gabriel Stacey, Rieder Mark J., Abecasis Goncalo, Altshuler David, Nickerson Deborah A., Boerwinkle Eric, Sunyaev Shamil, Bustamante Carlos D., Bamshad Michael J., Akey Joshua M. Evolution and Functional Impact of Rare Coding Variation from Deep Sequencing of Human Exomes. Science. 2012;337(6090):64–69. doi: 10.1126/science.1219240. - DOI - PMC - PubMed
1. The 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. - DOI - PMC - PubMed
1. Manolio TA, Fowler DM, Starita LM, Haendel MA, MacArthur DG, Biesecker LG, Worthey E, Chisholm RL, Green ED, Jacob HJ, et al. Bedside Back to bench: building bridges between basic and clinical genomic research. Cell. 2017;169:6–12. doi: 10.1016/j.cell.2017.03.005. - DOI - PMC - PubMed
1. de Beer TAP, Laskowski RA, Parks SL, Sipos B, Goldman N, Thornton JM. Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset. PLoS Comput Biol. 2013;9. - PMC - PubMed
1. Mahlich Y, Reeb J, Hecht M, Schelling M, De Beer TAP, Bromberg Y, Rost B. Common sequence variants affect molecular function more than rare variants? Sci Rep. 2017;7:1608. doi: 10.1038/s41598-017-01054-2. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

640508/Deutsche Forschungsgemeinschaft

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Variant effect predictions capture some aspects of deep mutational scanning experiments

Affiliations

Variant effect predictions capture some aspects of deep mutational scanning experiments

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials