Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 1;34(10):2537-2554.
doi: 10.1093/molbev/msx173.

Pneumococcal Capsule Synthesis Locus cps as Evolutionary Hotspot with Potential to Generate Novel Serotypes by Recombination

Affiliations

Pneumococcal Capsule Synthesis Locus cps as Evolutionary Hotspot with Potential to Generate Novel Serotypes by Recombination

Rafal J Mostowy et al. Mol Biol Evol. .

Abstract

Diversity of the polysaccharide capsule in Streptococcus pneumoniae-main surface antigen and the target of the currently used pneumococcal vaccines-constitutes a major obstacle in eliminating pneumococcal disease. Such diversity is genetically encoded by almost 100 variants of the capsule biosynthesis locus, cps. However, the evolutionary dynamics of the capsule remains not fully understood. Here, using genetic data from 4,519 bacterial isolates, we found cps to be an evolutionary hotspot with elevated substitution and recombination rates. These rates were a consequence of relaxed purifying selection and positive, diversifying selection acting at this locus, supporting the hypothesis that the capsule has an increased potential to generate novel diversity compared with the rest of the genome. Diversifying selection was particularly evident in the region of wzd/wze genes, which are known to regulate capsule expression and hence the bacterium's ability to cause disease. Using a novel, capsule-centered approach, we analyzed the evolutionary history of 12 major serogroups. Such analysis revealed their complex diversification scenarios, which were principally driven by recombination with other serogroups and other streptococci. Patterns of recombinational exchanges between serogroups could not be explained by serotype frequency alone, thus pointing to nonrandom associations between co-colonizing serotypes. Finally, we discovered a previously unobserved mosaic serotype 39X, which was confirmed to carry a viable and structurally novel capsule. Adding to previous discoveries of other mosaic capsules in densely sampled collections, these results emphasize the strong adaptive potential of the bacterium by its ability to generate novel antigenic diversity by recombination.

Keywords: conjugate vaccine; epidemiology; evolutionary dynamics; next-generation sequencing; pneumococcal disease; polysaccharide diversity.

PubMed Disclaimer

Figures

<sc>Fig</sc>. 1.
Fig. 1.
Properties of the dataset. (A) The distribution of the number of isolates stratified by geographical location and the source of isolation (carriage, disease or unknown). (B) The distribution of the number isolates stratified by the serotypes (top 20 shown). (C) A diversity network, where each node is represented by a cps reference sequence and an edge links two nodes if they are similar, i.e., share a minimum proportion s of the homologies. Red, bold edges show the conservative network where the minimum similarity was defined as sharing at least s =0.58 of homology groups (see Materials and Methods). Black edges show the additional connections obtained in a liberal network, where the minimum similarity was defined as sharing at least s =0.36 of homology groups. The size of each node reflects the full sample size, and the color shows the within-serotype cps genetic diversity measured using the mean pairwise Kimura K80 distance for all nonidentical isolates (full diversity distribution is shown in supplementary fig. S4, Supplementary Material online). Red labels of serotype nodes denote genetic serogroups which are analyzed in detail below.
<sc>Fig</sc>. 2.
Fig. 2.
Evolution of the four most common serogroups. Schematic dendograms shows the evolutionary history of the four major serogroups, which correspond to four largest clusters marked in red in figure 1: serogroup 6 (panel A), serogroup 14/15 (panel B), serogroup 19 (panel C), and serogroup 23 (panel D). The dendograms are based on the clonal trees inferred using cps-based alignments, one analyzed for each serogroups. Full resolution figures can be found in Supplementary Material online. Recombinations which occurred on branches leading to a new serotype or a mosaic are colored in red. The star sign is marking those branches where there was statistical support for the recombination (using STRUCTURE or Gubbins), and the remaining ones were hypothesized to have occurred based on the gene content comparison (see detailed discussion in Supplementary Material online). Clonal uncertainty due to the suggested model is reflected by dashed branches. The origin of the detected recombinations is analyzed in figure 5.
<sc>Fig</sc>. 3.
Fig. 3.
The molecular clock rate and selection in the cps locus in three different lineages, PMEN1, PMEN2 and PMEN14. (A) The inferred molecular clock rate of the whole-genome alignment as inferred by BEAST2, with the capsule removed (background), and to the capsule-only alignment (cps), with error bars showing 95% highest posterior density. (B) The null distribution of the molecular clock rate in the genome, measured in 20 random regions from the genome (repeated 1,000 times), versus the clock rate of the cps locus. (C) Comparison of the mean ω = dN/dS value in the genome versus the cps locus. (D) Distribution of ω values estimated for each coding sequence versus underlying gene diversity measured using K80 model. The values of ω lying outside the 95% quantile range are not shown. (E) Null distribution of ω values in the background is compared with the estimated ω in the cps and the wzd/wze region. Shaded regions show the 95% confidence intervals.
<sc>Fig</sc>. 4.
Fig. 4.
Recombination rates within the cps locus. (A) Recombination rates estimated for the 12 serogroups used in this study, with the relevant 95% confidence intervals. By definition these rates do not include long, cps-spanning recombinations which are invisible from the point of view of cps alignment. (B) Frequency of recombinations observed at the cps locus, with different colors showing number of recombinations for different serogroups, with rhamnose genes (top) and without rhamnose genes (bottom). Recombination positions were normalized such that total alignment length was 1 in all serogroups. Additionally, for the sake of comparison, the upstream wg-region (blue) and the rhamnose region (red) were normalized to 10% of the length each. (C) Recombination frequency measured using whole-genome approach, with within-cps events (left) versus full-cps events (right). (D) Recombination rate at the genomic background (excluding events at the cps) versus recombination rate of events affecting the cps, estimated from whole-genome alignments. Lineages of predominantly the same serogroup (80%) were chosen. The rates were normalized per base using mean alignment lengths of whole-genome and cps, respectively. The y = x line is shown in red, and marginal distributions are shown in blue.
<sc>Fig</sc>. 5.
Fig. 5.
Origin of cps recombinations. (A) The network shows the recombination flow among the serogroups defined in figure 1A: nodes correspond to serogroups and arrows correspond to the direction of cps recombination flow based on the most likely origin of the putative recombination events. The width of arrows reflects the number of recombination events (between 0 and 51) and the size of the nodes reflects the number of isolates within the serogroup (except for “unknown” and other streptococci). (B) Proportion of cps recombinations originating in the same serogroup (self) versus another serogroup (nonself) for each serogroup. Stars show significance of the departure from random distribution of recombination exchanges. The significance was calculated assuming a binomial distribution of self-/nonself-recombination with the probability corresponding to the frequency of self-/nonself-serogroups. (C) Excess of self- over nonself-recombinations in the cps-specific and cps-nonspecific region, as defined in the main text.
<sc>Fig</sc>. 6.
Fig. 6.
Lineage-jumping dynamics. (A) Dendogram based on the phylogeny of serogroup 6. Tips are grouped according to the three major populations identified: class-I 6A clade (green), 6C/6D clade (blue) and class-II 6B clade (red). Geometric shapes aligned with the tips denote the corresponding clonal complexes of the strains in which the serotype sequence was found. The clades of the tree with branch lengths shorter than 9 × 10−5 were collapsed and the most frequent lineage within the clade was plotted. (B) Lineage-jumping rates inferred for the four most frequent serogroups (6, 19, 23 and 14/15) with lineages defined as clonal complexes (CC; left) and as clonal complex groups (CCG; right). The red line shows the CCG jumping rate expected based on the observed CC jumping rate and the assumption that changes between all pairs of CC are equally likely. Mean estimate and the 95% of the highest posterior density is shown.

Similar articles

Cited by

References

    1. Aanensen DM, Mavroidi A, Bentley SD, Reeves PR, Spratt BG.. 2007. Predicted functions and linkage specificities of the products of the Streptococcus pneumoniae capsular biosynthetic loci. J Bacteriol. 18921:7856–7876. - PMC - PubMed
    1. Ansari MA, Didelot X.. 2014. Inference of the properties of the recombination process from whole bacterial genomes. Genetics 1961:253–265. - PMC - PubMed
    1. Assefa S, Keane TM, Otto TD, Newbold C, Berriman M.. 2009. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 2515:1968–1969. - PMC - PubMed
    1. Bapteste E, Lopez P, Bouchard F, Baquero F, McInerney JO, Burian RM.. 2012. Evolutionary analyses of non-genealogical bonds produced by introgressive descent. Proc Natl Acad Sci U S A. 10945:18266–18272. - PMC - PubMed
    1. Bateman A, Martin MJ, O’Donovan C, Magrane M, Apweiler R, Alpi E, Antunes R, Arganiska J, Bely B, Bingley, et al.2015. UniProt: a hub for protein information. Nucleic Acids Res. 43(Database issue):D204–D212. - PMC - PubMed

Publication types