Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 27:2:150058.
doi: 10.1038/sdata.2015.58. eCollection 2015.

Population genomic datasets describing the post-vaccine evolutionary epidemiology of Streptococcus pneumoniae

Affiliations

Population genomic datasets describing the post-vaccine evolutionary epidemiology of Streptococcus pneumoniae

Nicholas J Croucher et al. Sci Data. .

Abstract

Streptococcus pneumoniae is common nasopharyngeal commensal bacterium and important human pathogen. Vaccines against a subset of pneumococcal antigenic diversity have reduced rates of disease, without changing the frequency of asymptomatic carriage, through altering the bacterial population structure. These changes can be studied in detail through using genome sequencing to characterise systematically-sampled collections of carried S. pneumoniae. This dataset consists of 616 annotated draft genomes of isolates collected from children during routine visits to primary care physicians in Massachusetts between 2001, shortly after the seven valent polysaccharide conjugate vaccine was introduced, and 2007. Also made available are a core genome alignment and phylogeny describing the overall population structure, clusters of orthologous protein sequences, software for inferring serotype from Illumina reads, and whole genome alignments for the analysis of closely-related sets of pneumococci. These data can be used to study both bacterial evolution and the epidemiology of a pathogen population under selection from vaccine-induced immunity.

Keywords: Bacterial genetics; Genetic variation; Molecular evolution; Respiratory tract diseases.

PubMed Disclaimer

Conflict of interest statement

S.I.P. has investigator-initiated grants from Merck and Pfizer and has consulted for GlaxoSmithKline, Merck, Pfizer and Novartis. W.P.H. has consulted for GlaxoSmithKline. M.L. has consulted for Pfizer and Novartis.

Figures

Figure 1
Figure 1. Workflow for the generation of draft genome assemblies.
The flow chart shows the steps taken to generate the draft genome dataset; boxes in orange indicate steps at which typing was performed, allowing the integrity of sample handling to be checked, and boxes in blue indicate steps at which checks were performed to allow for the identification and elimination of low quality samples.
Figure 2
Figure 2. Validation of the draft genome assemblies.
(a) Venn diagram showing the overlap between the sequence types from the original genotyping of the collection, those inferred from Illumina sequence read mapping, and those inferred from the genome assemblies. Only data for the 594 isolates for which all three datatypes were available are represented here. (b) Venn diagram showing the overlap between sequence types inferred by different methodologies, in this case treating results as being consistent if only one of the seven loci differed between results. In this case, the sequence types inferred from read mapping and de novo assembly are identical, and differ from the original genotyping in only twelve cases. (c) Histogram showing the number of CDSs in publicly available annotated complete, or high quality draft, S. pneumoniae genomes. (d) Histogram showing the number of CDSs in the 616 draft genomes from Massachusetts. This distribution shows that the count of putative CDSs in each draft genome is within the range of CDSs identified in manually annotated genomes, consistent with the draft genomes being near-complete, and the CDS predictions being accurate.
Figure 3
Figure 3. Overall population structure of the 616 S. pneumoniae isolates.
The maximum likelihood phylogeny generated from the core genome alignment is presented, as displayed in the Microreact system (Data Citation 4), with each leaf node coloured according to its sequence cluster (SC).

Dataset use reported in

  • doi: 10.1542/peds.2008-3099
  • doi: 10.1542/peds.112.4.862
  • doi: 10.1542/peds.2004-2338
  • doi: 10.1099/00221287-144-11-3049
  • doi: 10.1086/510249
  • doi: 10.1016/j.vaccine.2011.09.075
  • doi: 10.1038/ng.2625
  • doi: 10.1038/ncomms6471

References

Data Citations

    1. Croucher N.J. 2009. International Nucleotide Sequence Database . FM211187
    1. Croucher N.J. 2015. Dryad. http://dx.doi.org/10.5061/dryad.t55gq - DOI
    1. Croucher N.J. 2013. International Nucleotide Sequence Database . PRJEB2632
    1. Croucher N.J. 2015. Microreact. http://microreact.org/project/NJwviE7F

References

    1. Weintraub A. Immunology of bacterial polysaccharide antigens. Carbohydrate Research 338, 2539–2547 (2003). - PubMed
    1. Hyams C., Camberlein E., Cohen J. M., Bax K. & Brown J. S. The Streptococcus pneumoniae capsule inhibits complement activity and neutrophil phagocytosis by multiple mechanisms. Infect Immun. 78, 704–715 (2010). - PMC - PubMed
    1. Austrian R. Pneumococcal otitis media and pneumococcal vaccines, a historical perspective. Vaccine 19 (Suppl 1): S71–S77 (2000). - PubMed
    1. Croucher N. J. et al. Selective and Genetic Constraints on Pneumococcal Serotype Switching. PLoS Genet 11, e1005095 (2015). - PMC - PubMed
    1. Whitney C. G. et al. Effectiveness of seven-valent pneumococcal conjugate vaccine against invasive pneumococcal disease: a matched case-control study. Lancet 368, 1495–1502 (2006). - PubMed

Publication types