Inferring fitness landscapes by regression produces biased estimates of epistasis
- PMID: 24843135
- PMCID: PMC4050575
- DOI: 10.1073/pnas.1400849111
Inferring fitness landscapes by regression produces biased estimates of epistasis
Abstract
The genotype-fitness map plays a fundamental role in shaping the dynamics of evolution. However, it is difficult to directly measure a fitness landscape in practice, because the number of possible genotypes is astronomical. One approach is to sample as many genotypes as possible, measure their fitnesses, and fit a statistical model of the landscape that includes additive and pairwise interactive effects between loci. Here, we elucidate the pitfalls of using such regressions by studying artificial but mathematically convenient fitness landscapes. We identify two sources of bias inherent in these regression procedures, each of which tends to underestimate high fitnesses and overestimate low fitnesses. We characterize these biases for random sampling of genotypes as well as samples drawn from a population under selection in the Wright-Fisher model of evolutionary dynamics. We show that common measures of epistasis, such as the number of monotonically increasing paths between ancestral and derived genotypes, the prevalence of sign epistasis, and the number of local fitness maxima, are distorted in the inferred landscape. As a result, the inferred landscape will provide systematically biased predictions for the dynamics of adaptation. We identify the same biases in a computational RNA-folding landscape as well as regulatory sequence binding data treated with the same fitting procedure. Finally, we present a method to ameliorate these biases in some cases.
Keywords: experimental evolution; molecular evolution; penalized regression.
Conflict of interest statement
The authors declare no conflict of interest.
Figures







Similar articles
-
How Good Are Statistical Models at Approximating Complex Fitness Landscapes?Mol Biol Evol. 2016 Sep;33(9):2454-68. doi: 10.1093/molbev/msw097. Epub 2016 May 14. Mol Biol Evol. 2016. PMID: 27189564 Free PMC article.
-
The changing geometry of a fitness landscape along an adaptive walk.PLoS Comput Biol. 2014 May 22;10(5):e1003520. doi: 10.1371/journal.pcbi.1003520. eCollection 2014 May. PLoS Comput Biol. 2014. PMID: 24853069 Free PMC article.
-
On the incongruence of genotype-phenotype and fitness landscapes.PLoS Comput Biol. 2022 Sep 19;18(9):e1010524. doi: 10.1371/journal.pcbi.1010524. eCollection 2022 Sep. PLoS Comput Biol. 2022. PMID: 36121840 Free PMC article.
-
Negative Epistasis in Experimental RNA Fitness Landscapes.J Mol Evol. 2017 Dec;85(5-6):159-168. doi: 10.1007/s00239-017-9817-5. Epub 2017 Nov 10. J Mol Evol. 2017. PMID: 29127445 Review.
-
Changing preferences: deformation of single position amino acid fitness landscapes and evolution of proteins.Biol Lett. 2015 Oct;11(10):20150315. doi: 10.1098/rsbl.2015.0315. Biol Lett. 2015. PMID: 26445980 Free PMC article. Review.
Cited by
-
Biophysical Inference of Epistasis and the Effects of Mutations on Protein Stability and Function.Mol Biol Evol. 2018 Oct 1;35(10):2345-2354. doi: 10.1093/molbev/msy141. Mol Biol Evol. 2018. PMID: 30085303 Free PMC article.
-
How Good Are Statistical Models at Approximating Complex Fitness Landscapes?Mol Biol Evol. 2016 Sep;33(9):2454-68. doi: 10.1093/molbev/msw097. Epub 2016 May 14. Mol Biol Evol. 2016. PMID: 27189564 Free PMC article.
-
Large-effect flowering time mutations reveal conditionally adaptive paths through fitness landscapes in Arabidopsis thaliana.Proc Natl Acad Sci U S A. 2019 Sep 3;116(36):17890-17899. doi: 10.1073/pnas.1902731116. Epub 2019 Aug 16. Proc Natl Acad Sci U S A. 2019. PMID: 31420516 Free PMC article.
-
Adaptation in protein fitness landscapes is facilitated by indirect paths.Elife. 2016 Jul 8;5:e16965. doi: 10.7554/eLife.16965. Elife. 2016. PMID: 27391790 Free PMC article.
-
Influence of multiple-sequence-alignment depth on Potts statistical models of protein covariation.Phys Rev E. 2019 Mar;99(3-1):032405. doi: 10.1103/PhysRevE.99.032405. Phys Rev E. 2019. PMID: 30999494 Free PMC article.
References
-
- Lenski RE, Rose MR, Simpson SC, Tadler SC. Long-term experimental evolution in Escherichia coli. I. Adaptation and divergence during 2,000 generations. Am Nat. 1991;138(6):1315.
-
- Elena SF, Lenski RE. Evolution experiments with microorganisms: The dynamics and genetic bases of adaptation. Nat Rev Genet. 2003;4(6):457–469. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials