Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 10;11(1):3442.
doi: 10.1038/s41467-020-17327-w.

Within-host microevolution of Streptococcus pneumoniae is rapid and adaptive during natural colonisation

Affiliations

Within-host microevolution of Streptococcus pneumoniae is rapid and adaptive during natural colonisation

Chrispin Chaguza et al. Nat Commun. .

Abstract

Genomic evolution, transmission and pathogenesis of Streptococcus pneumoniae, an opportunistic human-adapted pathogen, is driven principally by nasopharyngeal carriage. However, little is known about genomic changes during natural colonisation. Here, we use whole-genome sequencing to investigate within-host microevolution of naturally carried pneumococci in ninety-eight infants intensively sampled sequentially from birth until twelve months in a high-carriage African setting. We show that neutral evolution and nucleotide substitution rates up to forty-fold faster than observed over longer timescales in S. pneumoniae and other bacteria drives high within-host pneumococcal genetic diversity. Highly divergent co-existing strain variants emerge during colonisation episodes through real-time intra-host homologous recombination while the rest are co-transmitted or acquired independently during multiple colonisation episodes. Genic and intergenic parallel evolution occur particularly in antibiotic resistance, immune evasion and epithelial adhesion genes. Our findings suggest that within-host microevolution is rapid and adaptive during natural colonisation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Schematic of the study design and analysis workflow.
The newly born babies were recruited into the study at birth and nasopharyngeal swabs were taken with the first week after birth and every two weeks until six months and then after every month until they were one year old at which sampling was stopped. The analysis of these longitudinal data involved fitting multi-state and other models to determine colonisation dynamics in the babies during the first year of life and whole-genome analysis to assess the within-host genetic diversity, recombination and mutation rate of the isolates. The map of The Gambia was generated by the authors in R software using ggmap v3.0.0 package (https://cran.r-project.org/web/packages/ggmap/). The images of the infants and adults, and the DNA sequencing machine were created with BioRender (https://biorender.com/) with permission to publish.
Fig. 2
Fig. 2. Characteristics and dynamics of the extended pneumococcal strains.
a Frequency of serotypes; each episode was counted once and serotypes with frequency >0.2% are shown. b An example of a colonisation profile for infant ID: 65 showing different colonisation episodes. The sampling point marked with the cross (×) represents culture-negative pneumococcal samples (uncolonised). Different types of episodes are shown in (b) namely transient colonisation whereby an episode consisted of a serotype was detected at a single time point, extended colonisation which refers to an episode where the serotype was detected at multiple time points and multiple colonisation where there was co-occurrence of overlapping episodes of different serotypes at certain time points. c Schematic representation of the three-state multistate model showing colonised and uncolonised carriage states and the estimated transition intensities (rates) between the states. d, e Observed and expected prevalence of each colonisation state. f The inferred sojourn time (duration) in each colonisation state. The error bars represent the 95% confidence interval for the estimated mean values.
Fig. 3
Fig. 3. Within-host pneumococcal genetic diversity during colonisation.
The strip charts, box and violin plots showing the number of SNPs calculated between isolates of the same serotype and ST within the same episode. The isolates sampled at five or less weeks apart are coloured in light blue while those sample at more than six weeks apart are shown in darker blue. The genetic diversity of some strains was much higher than the rest of the strains in the episode for some serotypes for example 11A, 16F, 19A, 23F, 6A, 6B and NT; which suggested the occurrence of other evolutionary processes other processes other than random substitution particularly genomic recombination. The Y-axis of each plot is shown in log10 scale for clarity. The number of data points for each group are presented in the format serotype (n = n1; n2) where serotype is the capsular type, n1 and n2 is the number of points for isolates not sampled within and within six weeks apart: 10A (n = 19;39), 11A (n = 17;25), 11B (n = 1;0), 12F (n = 7;2), 13 (n = 17;29), 14 (n = 40;31), 15A (n = 17;17), 15B/C (n = 25;35), 16F (n = 10;12), 17F (n = 4;1), 18A (n = 15;21), 18C (n = 7;3), 19A (n = 78;112), 19F (n = 14;7), 20 (n = 17;38), 21 (n = 26;21), 22A (n = 5;11), 23A (n = 10;2), 23B (n = 43;32), 23F (n = 15;43), 28F (n = 3;0), 34 (n = 26;60), 35B (n = 25;49), 38 (n = 9;0), 39 (n = 12;12), 4 (n = 6;5), 40 (n = 6;6), 48 (n = 6;9), 6A (n = 76;102), 6B/E (n = 63;108), 7F (n = 3;3), 8 (n = 1;0), 9L (n = 14;25), 9V (n = 10;12) and NT (n = 5;0).
Fig. 4
Fig. 4. Within-host homologous recombination during colonisation.
a, b Two examples of colonisation episodes namely INF57:11A:1 and INF26:23F:1 respectively, where recombination blocks were detected. The episode name is shown in the format A:B:C where A,B and C represents the infant ID, serotype and number of episodes with the serotype respectively. (I) Colonisation episode showing the time points at which the serotype in the episode was detected. Some or all the detected samples were sequenced. In episode INF57:11A:1, serotype 11A was detected from week 3 to 17. A recombination block was detected at week 13 but the recombinant strain did not persist until the next sampling time at week 17. In episode INF26:23F:1, serotype 23F was detected from week 7 to week 35. Recombination block was first detected at week 11 but it persisted, and the recombinant strain was sampled again at week 17. (II) Distribution of SNPs across genome of the serotype 11A and 23F in episodes INF57:11A:1 and INF26:23F:1 respectively. The coloured line (red) shows occurrence of a SNP in the strain using the first sequenced genome in the episode as the reference or ancestral strain. The SNP are enhanced for clarity. (III) A multiple sequence alignment of showing location of the SNPs and visual evidence of the emergence of a recombinant strain within the episode. The value for r/m represents the number of SNPs within recombination blocks relative to SNPs outside the blocks. (IV) The distribution of the SNPs is highlighted by the frequency polygon, generated using widow size of 1000 bp, which shows spikes in the SNP density across the recombinogenic regions.
Fig. 5
Fig. 5. Within-host mutation rates during natural colonisation.
Episodes where molecular-clock signal was evident were analysed. Serotypes with >4 sequenced genomes per individual were included in the analysis. The episode name is shown in the format A:B:C where A, B and C represents the infant ID, serotype and number of episodes with the serotype respectively. Linear relationship between the number of accrued SNPs in comparison with the reference genome sequenced at the onset of the episode was assessed using linear regression. The nucleotide substitution rate (μ) corresponded to the estimated number of SNPs site−1 year−1 based on the regression coefficient (β). The units of β, i.e., the mutation rate expressed as the number of SNPs per week. The shaded area surrounding the fitted linear regression line represent the 95% confidence interval based on the standard error of the mean slope of the regression line. The values of the substitution rates expressed as SNPs site−1 year−1 are shown in Table 2.
Fig. 6
Fig. 6. Parallel genic and intergenic SNPs identified during colonisation.
a Bar plot showing coding or genic regions containing synonymous (red) and non-synonymous (blue) SNPs in the genome. b Bar plot similar to (a) but showing genomic regions with intergenic SNPs. c The number of episodes containing a genic or intergenic SNP. d Bar plot showing number of episodes containing a genic and intergenic SNP. e Proportion of episodes with parallel SNPs (dark blue) in genic and intergenic SNPs. f Number of episodes with synonymous and non-synonymous amino acid change in coding regions. g Number of colonisation episodes with a change at each codon position. h Carriage duration of episodes with parallel and non-parallel SNPs. The letters N, S and I stand for non-synonymous, synonymous and intergenic SNPs respectively. The number of data points for each group were as follows: N and non-parallel (n = 927), S and non-parallel (n = 1088), I and non-parallel (n = 311), N and parallel (n = 297), S and parallel (n = 228), and I and parallel (n = 790). i Functional classification of genes with parallel SNPs. Only episodes with >3 sequenced genomes were included in the analysis. The statistical significance is shown by the number of asterisks as follows: **P < 0.01, ***P < 0.001.
Fig. 7
Fig. 7. Timing and duration of parallel mutation during natural colonisation.
Type of parallel SNP is shown by different panels in the figure as follows; a non-synonymous, b synonymous, and c intergenic. The estimates were calculated for each extended colonisation episode with >3 sequenced isolates. The parallel SNPs coloured in orange were propagated throughout the episode after occurrence while those coloured in dark blue did not persist over the entire episode.
Fig. 8
Fig. 8. Highly mutated genes during natural colonisation.
a Normalised and unnormalized number of SNPs detected in each gene during colonisation episodes. Normalisation was done by estimating the number of SNPs per kilobase pair (Kb). b Normalised number of synonymous and non-synonymous SNPs per Kb in each gene.

References

    1. Wahl B, et al. Burden of Streptococcus pneumoniae and Haemophilus influenzae type b disease in children in the era of conjugate vaccines: global, regional, and national estimates for 2000-15. Lancet Glob. Health. 2018;6:e744–e757. - PMC - PubMed
    1. Gladstone RA, et al. International genomic definition of pneumococcal lineages, to contextualise disease, antibiotic resistance and vaccine impact. EBioMedicine. 2019;43:338–346. - PMC - PubMed
    1. Abdullahi O, et al. Rates of acquisition and clearance of pneumococcal serotypes in the nasopharynges of children in Kilifi District, Kenya. J. Infect. Dis. 2012;206:1020–1029. - PMC - PubMed
    1. Brueggemann AB, et al. Clonal relationships between invasive and carriage Streptococcus pneumoniae and serotype- and clone-specific differences in invasive disease potential. J. Infect. Dis. 2003;187:1424–1432. - PubMed
    1. Hanage WP, et al. Invasiveness of serotypes and clones of Streptococcus pneumoniae among children in Finland. Infect. Immun. 2005;73:431–435. - PMC - PubMed

Publication types