. 2020 Oct 22;18(10):e3000878.

doi: 10.1371/journal.pbio.3000878. eCollection 2020 Oct.

Frequency-dependent selection can forecast evolution in Streptococcus pneumoniae

Taj Azarian^{1

2}, Pamela P Martinez², Brian J Arnold², Xueting Qiu², Lindsay R Grant³, Jukka Corander^{4

5

6}, Christophe Fraser⁷, Nicholas J Croucher⁸, Laura L Hammitt³, Raymond Reid³, Mathuram Santosham³, Robert C Weatherholtz³, Stephen D Bentley⁶, Katherine L O'Brien⁹, Marc Lipsitch^{2

10}, William P Hanage²

Affiliations

¹ Burnett School of Biomedical Sciences, University of Central Florida, Orlando, Florida, United States of America.
² Center for Communicable Disease Dynamics, Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.
³ Center for American Indian Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America.
⁴ Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
⁵ Department of Biostatistics, University of Oslo, Oslo, Norway.
⁶ Infection Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
⁷ Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom.
⁸ MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom.
⁹ World Health Organization, Geneva, Switzerland.
¹⁰ Department of Immunology and Infectious Diseases, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.

PMID: 33091022
PMCID: PMC7580979
DOI: 10.1371/journal.pbio.3000878

Frequency-dependent selection can forecast evolution in Streptococcus pneumoniae

Taj Azarian et al. PLoS Biol. 2020.

. 2020 Oct 22;18(10):e3000878.

doi: 10.1371/journal.pbio.3000878. eCollection 2020 Oct.

Authors

Affiliations

¹ Burnett School of Biomedical Sciences, University of Central Florida, Orlando, Florida, United States of America.
² Center for Communicable Disease Dynamics, Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.
³ Center for American Indian Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America.
⁴ Helsinki Institute for Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
⁵ Department of Biostatistics, University of Oslo, Oslo, Norway.
⁶ Infection Genomics, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
⁷ Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom.
⁸ MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom.
⁹ World Health Organization, Geneva, Switzerland.
¹⁰ Department of Immunology and Infectious Diseases, T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America.

PMID: 33091022
PMCID: PMC7580979
DOI: 10.1371/journal.pbio.3000878

Abstract

Predicting how pathogen populations will change over time is challenging. Such has been the case with Streptococcus pneumoniae, an important human pathogen, and the pneumococcal conjugate vaccines (PCVs), which target only a fraction of the strains in the population. Here, we use the frequencies of accessory genes to predict changes in the pneumococcal population after vaccination, hypothesizing that these frequencies reflect negative frequency-dependent selection (NFDS) on the gene products. We find that the standardized predicted fitness of a strain, estimated by an NFDS-based model at the time the vaccine is introduced, enables us to predict whether the strain increases or decreases in prevalence following vaccination. Further, we are able to forecast the equilibrium post-vaccine population composition and assess the invasion capacity of emerging lineages. Overall, we provide a method for predicting the impact of an intervention on pneumococcal populations with potential application to other bacterial pathogens in which NFDS is a driving force.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests. ML has consulted for Pfizer, Affinivax, and Merck and has received grant support not related to this paper from Pfizer and PATH Vaccine Solutions. WPH, ML, and NJC have consulted for Antigen Discovery Inc. The authors have declared that no competing interests exist. KLOB has received grant support for pneumococcal work not related to this paper from Pfizer, GSK, and Gavi. KLOB has consulted for Merck and Sanofi Pasteur. LRG, LLH, and RCW have received grant support not related to this paper from Pfizer, Merck, and GSK.

Figures

**Fig 1. Strain’s prevalence.**
A. Pre-vaccine to post-vaccine SCs. Strains are ordered from highest to lowest pre-vaccine prevalence. Raw data used in this figure are included in S1 Table. Observed prevalence change calculated as post-vaccine frequencies minus pre-vaccine frequencies. Changes in prevalence are compared with that expected under a pro rata null model (i.e., not using the predictive methods in this paper). Observed changes in prevalence are represented by points colored by the serotype composition of the strain: NVT only, PCV7 VT only, and VT-NVT. The point and whiskers show the prevalence change expected if all VT strains were removed and NVT increased proportional to their pre-vaccine prevalence—i.e., in a null model of pro rata increase in which only the VT strains were removed and all NVT strains increased equally in proportion to their pre-vaccine prevalence. The dot is the median, and the whiskers give the 2.5% and 97.5% quantiles of predicted changes under the null model using 10,000 bootstraps from pre-vaccine samples. Significant differences between the changes in prevalence from the pro rata model and the observed data are denoted with plus and minus signs specifying strains that were significantly more (n = 9) or less (n = 4) common, respectively. Among the most successful were strains that contained both VT and NVT isolates (SC-22 and SC-23) whose NVT component included serotypes 6C, 15C, and 35B, as well as SC-24 and SC-25, which were dominated by the NVT serotypes 23A and 15C, respectively. SC-27 is polyphyletic, composed of an aggregate of strains that are at low frequency in the overall population. Compared with strains composed of solely NVT isolates, those with mixed NVT-VT had marginally higher risk differences, indicating greater success than expected under the null model (β = 0.03, SE = 0.015, F(1,29) = 3.67, p = 0.06). Two strains that emerged during the study period (SC-10 and SC-24) were not included in this analysis as they were not present at the first time point. See S1 Data and S1 Code for details. NVT, nonvaccine serotype; SC, sequence cluster; SE, standard error; VT, vaccine serotype; VT-NVT, mixed VT and NVT.

**Fig 2. Simulations.**
A. Conceptual diagram for simulations. Descriptive representation of the strain prevalence at different stages relative to vaccine introduction: pre-vaccine equilibrium, vaccine introduction, and post-vaccine equilibrium. We modeled a population of VT and NVT strains (represented as unique genotypes with alleles 1 or 0 at a locus, denoting the presence or absence of a single accessory locus) and simulated the removal of VT genotypes, following the post-vaccine population to equilibrium (details in methods). In this illustrative figure, 8 strains are shown, with their prevalence in the population evolving over time. The system is allowed to evolve until it reaches a steady state (“pre-vaccine equilibrium”). Three strains were then targeted to mimic a vaccine introduction, which removes them from the system. The predicted fitness was then estimated from the period just after the vaccine introduction, when the population has been depleted of VT but relative prevalence of NVT has not changed—a quantity that can be calculated from pre-vaccine data alone. Finally, the system reaches a second steady state (“post-vaccine equilibrium”). Different shades of blue represent the rank of the strain frequencies in the post-vaccine equilibrium. B. Predicted fitness. Comparison of the direction of prevalence change of strains from pre- to post-vaccine using simulated data and predicted fitness from these simulated data. For these 10 replicate simulations, 2,371 accessory loci and 35 randomly chosen strains were simulated, including 3 VT genotypes. For each replicate, the pre-vaccine equilibrium frequencies of the 2,371 accessory loci were varied. Final prevalence of strains was obtained by quadratic programing, and prevalence change for each NVT strain was calculated as post-vaccine prevalence minus pre-vaccine prevalence, in both cases with all NVT strains summing to 100%. Each column in the decreased and increased category represents the results from 1 simulation (i.e., the first column in the decreased category corresponds to the first column in the increased category, and the dots sum to 32). The predicted fitness of the strain accurately predicts the direction of the prevalence change in 92.8% of cases (teal dots). Gray dots represent instances in which the direction of the prevalence change was not predicted correctly in the simulation. See S1 Data and S1 Code for details. NVT, nonvaccine serotype; VT, vaccine serotype.

**Fig 3. Predicted fitness and predicted prevalence.**
A, Relationship between predicted fitness and observed prevalence change from pre- to post-vaccine among 31 strains, in each case summing to 100%. Prevalence change was calculated as post-vaccine frequencies minus pre-vaccine frequencies. Predicted fitness was calculated using data solely from the pre-vaccine sample (n = 27), with the exceptions of strains for which there were no NVT isolates present in the sample before the introduction of PCV7 (n = 4). For those strains, data were imputed from the time point during which they were first observed. Four strains were excluded either because they were polyphyletic (SC-27) or had no NVT isolates present pre- or post-vaccine, and therefore could not be imputed (SC-04C, SC-12, and SC-17). The points are colored by serotype composition of strains: nonvaccine types in blue and mixed VT and NVT types in yellow. The shaded quadrants indicate regions of accurate prediction of the prevalence change direction (increased post-vaccine versus decreased) given the predicted fitness value. Three outlier strains are annotated for which the predicted direction of their prevalence change differed from that which was observed (i.e., they were predicted to increase based on their fitness when their prevalence from pre- to post-vaccine decreased, or vice versa). B, Scatterplot of observed versus predicted prevalence of 27 strains at post-vaccine equilibrium based on quadratic programming. These 27 strains contained at least 1 NVT strain pre-vaccine. Points are colored based on serotype composition as described in panel A. Perfect predictions would lie on the dotted line of equality (1:1 line). The shaded gray region shows the CI from the linear regression model used to test for deviation of the observed versus predicted values compared with the 1:1 line. Two outliers are annotated for which the difference between their predicted and observed prevalence was >1.5 times the interquartile range of the distribution of predicted and observed prevalence differences. As a note, the predictions remained significant if SC-09 (the extreme strain at 10% prevalence in B) was removed (slope, 95% CI 0.021–1.05; intercept, 95% CI −0.003 to 0.03; p = 0.19, chi-squared = 3.5). **C-D**, Comparison of the predicted prevalence change from quadratic programming analysis using accessory genes and naive pro rata model as shown in Fig 1B but applied to just these 27 strains. The dotted line of equality (1:1 line) and CI (gray) are shown as in panel B. Goodness-of-fit statistics including SSE, RMSE, and degrees of freedom Adj. R² are given for each model. The lower SSE and RMSE indicate a better model fit. See S1 Data and S1 Code for details. Adj. R², adjusted R-squared; NFDS, negative frequency-dependent selection; NVT, nonvaccine serotype; PCV7, 7-valent pneumococcal conjugate vaccine; RMSE, root mean squared error; SC, sequence cluster; SSE, sum of squared errors; VT, vaccine serotype.

See this image and copyright information in PMC

References

1. Gray P, Palmroth J, Luostarinen T, Apter D, Dubin G, Garnett G, et al. Evaluation of HPV type-replacement in unvaccinated and vaccinated adolescent females-Post-hoc analysis of a community-randomized clinical trial (II). Int J cancer. 2018;142:2491–2500. 10.1002/ijc.31281 - DOI - PubMed
1. Menzies RI, Markey P, Boyd R, Koehler AP, McIntyre PB. No evidence of increasing Haemophilus influenzae non-b infection in Australian Aboriginal children. Int J Circumpolar Health. 2013;72:20992 10.3402/ijch.v72i0.20992 - DOI - PMC - PubMed
1. Hogea C, Van Effelterre T, Vyse A. Exploring the population-level impact of MenB vaccination via modeling: Potential for serogroup replacement. Hum Vaccin Immunother. 2016;12:451–66. 10.1080/21645515.2015.1080400 - DOI - PMC - PubMed
1. Levin BR, Lipsitch M, Bonhoeffer S. Population Biology, Evolution, and Infectious Disease: Convergence and Synthesis. Science (80-). 1999;283:806 LP– 809. - PubMed
1. Morris DH, Gostic KM, Pompei S, Bedford T, Łuksza M, Neher RA, et al. Predictive Modeling of Influenza Shows the Promise of Applied Evolutionary Biology. Trends in Microbiology. 2018;26(2):102–118. 10.1016/j.tim.2017.09.004 - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Frequency-dependent selection can forecast evolution in Streptococcus pneumoniae

Affiliations

Frequency-dependent selection can forecast evolution in Streptococcus pneumoniae

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources