Inference for one-step beneficial mutations using next generation sequencing

Andrzej J Wojtowicz, Craig R Miller, Paul Joyce

PMID: 25720101
PMCID: PMC5563372
DOI: 10.1515/sagmb-2014-0030

Inference for one-step beneficial mutations using next generation sequencing

Andrzej J Wojtowicz et al. Stat Appl Genet Mol Biol. 2015 Feb.

. 2015 Feb;14(1):65-81.

doi: 10.1515/sagmb-2014-0030.

Authors

Andrzej J Wojtowicz, Craig R Miller, Paul Joyce

PMID: 25720101
PMCID: PMC5563372
DOI: 10.1515/sagmb-2014-0030

Abstract

Experimental evolution is an important research method that allows for the study of evolutionary processes occurring in microorganisms. Here we present a novel approach to experimental evolution that is based on application of next generation sequencing. Under this approach population level sequencing is applied to an evolving population in which multiple first-step beneficial mutations occur concurrently. As a result, frequencies of multiple beneficial mutations are observed in each replicate of an experiment. For this new type of data we develop methods of statistical inference. In particular, we propose a method for imputing selection coefficients of first-step beneficial mutations. The imputed selection coefficient are then used for testing the distribution of first-step beneficial mutations and for estimation of mean selection coefficient. In the case when selection coefficients are uniformly distributed, collected data may also be used to estimate the total number of available first-step beneficial mutations.

PubMed Disclaimer

Figures

**Figure 1**
Dynamics of the mean proportion of a beneficial mutation under different values of Nμ. Data obtained by simulation in which a single genotype with selection coefficient *s_i*=0.1 competed against the wild type. According to the results, the mean proportion of a mutation converges to the function g(*s_i*) when Nμ≥10.

**Figure 2**
Dynamics of the standard deviation of the proportion of a beneficial mutation under different values of Nμ. Data obtained by simulation in which a single genotype with selection coefficient *s_i*=0.1 competed against the wild type. According to the results, as Nμ increases, variance of the proportion decreases. It can be concluded that when Nμ→∞, the standard deviation goes to 0 and the observed proportion converges to its expected value which is given by the function g(*s_i*).

**Figure 3**
Plots of imputed selection coefficient vs. their true values. Selection coefficients were generated from either a uniform or an exponential distribution. Each plot contains combined results from 25 (uniform case) or 50 (exponential case) simulations. Selection coefficients were imputed for mutations observed ≥2 times. Performance of the proposed method is good since, in most cases, imputed values are close to the actual values of selection coefficients. Accuracy of imputation increases as the effects of mutations become larger.

**Figure 4**
Coefficient of variation $(\sqrt{MSE} / s)$ of the imputed selection coefficients. The results are based on 10,000 replicates of the simulation procedure. The presented curves are left-truncated at the point where mutations are observed in <10% (or 1000) replicates. Accuracy of imputation improves as the effects of mutations become larger. Increasing the number of reads from k=100 to k=500 allows for observing and imputing additional low or medium effect mutations, but it does not improve accuracy of imputation of big effect mutations since these are already accurately estimated when k=100.

**Figure 5**
Type I error and power of the LRT when the test is applied to imputed selection coefficients. The power of the test was examined under the uniform distribution of selection coefficients. The test was conducted if the sample size was ≥5. According to the left panel, type I error of the test is moderately inflated. The right panel indicates that the power of the LRT is high when selection coefficients are uniformly distributed.

**Figure 6**
Coefficient of variation $(\sqrt{MSE} / r)$ of estimators of r when selection coefficients are a sample from a uniform distribution. The approach based on the joint likelihood function for r and δ provides more accurate estimates than the simplified method, but as the number of reads (k) increases the difference becomes smaller.

See this image and copyright information in PMC

References

1. Barrett RDH, MacLean RC, Bell G. Mutations of intermediate effect are responsible for adaptation in evolving Pseudomonas fluorescens populations. Biol. Lett. 2006;2:236–238. - PMC - PubMed
1. Barrick JE, Kauth MR, Strelioff CC, Lenski RE. Escherichia coli rpoB mutants have increased evolvability in proportion to their fitness defects. Mol. Biol. Evol. 2010;27:1338–1347. - PMC - PubMed
1. Beisel CJ, Rokyta DR, Wichman HA, Joyce P. Testing the extreme value domain of attraction for distributions of beneficial fitness effects. Genetics. 2007;176:2441–2449. - PMC - PubMed
1. Brockhurst MA, Colegrave N, Rozen DE. Next-generation sequencing as a tool to study microbial evolution. Mole. Ecol. 2011;20:972–980. - PubMed
1. Castillo E, Hadi AS. Fitting the generalized Pareto distribution to data. J. Am. Stat. Assoc. 1997;92:1609–1620.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inference for one-step beneficial mutations using next generation sequencing

Inference for one-step beneficial mutations using next generation sequencing

Authors

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials