Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov;9(11):001111.
doi: 10.1099/mgen.0.001111.

Comparison of genomic diversity between single and pooled Staphylococcus aureus colonies isolated from human colonization cultures

Affiliations

Comparison of genomic diversity between single and pooled Staphylococcus aureus colonies isolated from human colonization cultures

Vishnu Raghuram et al. Microb Genom. 2023 Nov.

Abstract

The most common approach to sampling the bacterial populations within an infected or colonized host is to sequence genomes from a single colony obtained from a culture plate. However, it is recognized that this method does not capture the genetic diversity in the population. Sequencing a mixture of several colonies (pool-seq) is a better approach to detect population heterogeneity, but it is more complex to analyse due to different types of heterogeneity, such as within-clone polymorphisms, multi-strain mixtures, multi-species mixtures and contamination. Here, we compared 8 single-colony isolates (singles) and pool-seq on a set of 2286 Staphylococcus aureus culture samples to identify features that can distinguish pure samples, samples undergoing intraclonal variation and mixed strain samples. The samples were obtained by swabbing 3 body sites on 85 human participants quarterly for a year, who initially presented with a methicillin-resistant S. aureus skin and soft-tissue infection (SSTI). We compared parameters such as sequence quality, contamination, allele frequency, nucleotide diversity and pangenome diversity in each pool to those for the corresponding singles. Comparing singles from the same culture plate, we found that 18% of sample collections contained mixtures of multiple multilocus sequence types (MLSTs or STs). We showed that pool-seq data alone could predict the presence of multi-ST populations with 95% accuracy. We also showed that pool-seq could be used to estimate the number of intra-clonal polymorphic sites in the population. Additionally, we found that the pool may contain clinically relevant genes such as antimicrobial resistance markers that may be missed when only examining singles. These results highlight the potential advantage of analysing genome sequences of total populations obtained from clinical cultures rather than single colonies.

Keywords: Staphylococcus aureus; adaptation; asymptomatic carriage; genetic diversity; whole-genome sequencing.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no conflicts of interest.

Figures

Fig. 1.
Fig. 1.
Schematic representation of colony collection strategy, names and descriptions of isolate groups analysed in this study. (a) Diagram depicting the number of samples in this study. From 85 study participants, we collected a total of 254 culture samples. For each collection, we obtained eight single colonies, and pooled the remaining colonies on the plate. (b) Diagram describing the terminology for specific isolate groups used in this study.
Fig. 2.
Fig. 2.
Pairwise SNP distance between and within collections. (a) Box plots showing per-collection SNP distance distributions. For each collection shown on the y-axis, the x-axis shows the corresponding distribution of core genome SNP distances in log scale. Black vertical lines show the median SNP distances and boxes show the interquartile range. Whiskers represent values up to 1.5 times the first or third quartile. Black dots represent outliers beyond the whiskers range. (b) Bar plot showing number of genomes per ST. Multilocus sequence typing was performed by the software tool mlst (see Methods). The x-axis shows the number of isolates assigned to the corresponding ST shown on the y-axis. (c) Bar plot showing the number of STs detected per participant. Multilocus sequence typing was performed for all eight singles from a participant and the number of unique STs detected per participant was plotted. (d) Maximum-likelihood phylogeny of 296 isolates, representing at least 1 isolate from all collections. All non-identical genomes from each collection were aligned by snippy and a core genome phylogeny was constructed using IQ-TREE (see Methods). Tree tips are coloured by ST; only the top 10 most frequent STs are shown, and the remaining ones are grouped into ‘other’.
Fig. 3.
Fig. 3.
Assembly quality can be used to assess population heterogeneity. (a) There was no significant difference in the assembly coverage between pools and singles. Violin plot showing distribution of assembly coverage between pools and singles. Assembly coverage for each pool and single was calculated by Bactopia against an auto-chosen reference (see Methods). Circles indicate mono-ST collections and triangles indicate multi-ST collections. Dark red points indicate pools and white points indicate singles. Multiple points stacked on top of each other may appear darker than isolated points. (b) Pool assemblies were more likely to have a higher number of contigs than single assemblies. Violin plot showing distribution of number of assembly contigs in pools and singles. Pooled samples were processed identically to singles with Bactopia using SPAdes. Circles indicate mono-ST collections and triangles indicate multi-ST collections. Dark red points indicate pools and white points indicate singles. Multiple points stacked on top of each other may appear darker than isolated points. (c) Pooled samples have varying sources of contamination while singles are pure. CheckM contamination and heterogeneity scores showed that all single colonies have no contamination, while 6 % of pools are contaminated by phylogenetically distant sources and 3 % of pools are contaminated by phylogenetically similar sources. The black vertical line marks a heterogeneity score of 50, below which the source of contamination is considered phylogenetically distant and vice versa. Circles indicate mono-ST collections and triangles indicate multi-ST collections. Dark red points indicate pools. Multiple points stacked on top of each other may appear darker than isolated points.
Fig. 4.
Fig. 4.
(a) The MAF index could be used to assess multi-ST pools. Dot plot depicting the number of variant positions and the average MAF for mono-ST (circles) and multi-ST (triangle) pools. The x-axis indicates the number of variant positions compared to the corresponding reference. The y-axis indicates the average minor allele frequency (MAF). The average MAF was calculated by summing the MAFs of all intermediate alleles and dividing by the total number of variant positions. Red dots correspond to mono-ST pools and triangles correspond to multi-ST pools. The black horizontal line indicates an average MAF of 0.1. The black vertical line indicates 0.1% of the S. aureus genome (2800 sites). The frequency of the dots at their corresponding x and y positions is indicated by the histogram above the x- and to the right of the y-axis respectively. (b) Average nucleotide diversity suggested most pools comprise single strains. 94% of pools had nucleotide diversity less than a theoretical 99:01 mixture of two strains. LEFT: Dots and colours indicate average nucleotide diversity value for each pool, expected pool (reads from eight singles combined in equal proportions), downsampled pools (reads from four and two random singles combined in equal proportions) and all 2032 singles. Grey dashed lines connect corresponding samples. Black solid horizontal lines indicate the average nucleotide diversity value for in-silico mixtures of two S. aureus genomes 30,000 SNPs apart. The ratio of each mixture is indicated over each solid black line. The frequency of the dots at their corresponding x positions are indicated by the histogram to the right.
Fig. 5.
Fig. 5.
(a) Pools captured more variants than eight single colonies combined. Each bar indicates a collection, and the height of the bar indicates the fraction of variants found in the corresponding sample group (pools, expected pools, four colony pools, two colony pools, single colony) to the total number of variants found in all samples in the collection (pool plus all eight singles). For example, a bar with height 0.25 in the fifth row (one colony) shows that if one random single colony was examined from the specific collection corresponding to the bar, we would find 50% of the total number of variants found in the collection (pool plus all eight singles). Bars for each sample group are ordered by lowest to highest. A value of one indicates 100% of the variants found in both the pools and all eight singles combined are represented in the sample group. (b) Allele frequencies in the pool were proportional to the number of singles the variant was detected in. Boxplots showing allele frequencies of variants detected in zero singles up to eight singles. Allele frequency of each variant found in the pool increased as the variant was found in more colonies in the corresponding singles. Boxes show the interquartile range and whiskers represent values up to 1.5 times the first or third quartile. White dots represent outliers beyond the whiskers range. Black horizontal line in each boxplot indicates the mean.
Fig. 6.
Fig. 6.
Allelic variation in pools and singles from the same sample were positively correlated. (a) The number of segregating sites in the true pools was proportional to the number of segregating sites in the expected pool. For mono-ST collections (collections where all eight singles, the pool and the auto-chosen reference were called the same ST), the number of sites with allelic variation was comparable between the true pools (y-axis) and the expected pool (x-axis) (eight singles combined). If the same site was fixed in all eight singles and in the pool, it was not included. Blue regression line depicts a linear relationship with a Pearson’s r of 0.352. (b) AFs of variants in the expected pool did not reliably predict the AFs of the same variants in the true pool. Frequency distribution plot showing Pearson’s r for all 198 mono-ST collections. x-axis depicts Pearson’s r and y-axis depicts number of collections.
Fig. 7.
Fig. 7.
A median of one additional AMR class can be observed in the pools compared to singles. Ridgeline plot showing number of AMR gene classes detected in pools, the pangenome of expected and downsampled pools (pangenome of eight, four and two singles combined), and a random single colony. The x-axis shows the number of AMR classes detected in the sample by AMRFinder and the y-axis shows the corresponding sample. Black vertical line shows the median number of AMR classes detected for each sample group. White circles under each ridgeline represent individual collections and the number of AMR classes detected.
Fig. 8.
Fig. 8.
Diminishing returns in the number of new variants or new AMR genes observed with the addition of more sequencing runs. Dot plot depicting the number of new variants (a) or new AMR genes (b) observed for additional sequencing runs. Red dots depict the first sequencing run being the pool, and the additional runs being single colonies (1, pool; 2, pool+one single; 3, pool+ two singles…). White dots depict only singles (1, one single; 2, two singles,…).

Update of

References

    1. Raghuram V. Code for analysis available in github: github.com/VishnuRaghuram94/GASP. figshare. Dataset. Raw data for the Genomic Analysis of Singles and Pools (GASP) project. 2023. - DOI
    1. Giulieri SG, Guérillot R, Duchene S, Hachani A, Daniel D, et al. Niche-specific genome degradation and convergent evolution shaping Staphylococcus aureus adaptation during severe infections. Elife. 2022;11:e77195. doi: 10.7554/eLife.77195. - DOI - PMC - PubMed
    1. Talbot BM, Jacko NF, Petit RA, Pegues DA, Shumaker MJ, et al. Unsuspected clonal spread of methicillin-resistant Staphylococcus aureus causing bloodstream infections in hospitalized adults detected using whole genome sequencing. Clin Infect Dis. 2022;75:2104–2112. doi: 10.1093/cid/ciac339. - DOI - PMC - PubMed
    1. Armstrong GL, MacCannell DR, Taylor J, Carleton HA, Neuhaus EB, et al. Pathogen genomics in public health. N Engl J Med. 2019;381:2569–2580. doi: 10.1056/NEJMsr1813907. - DOI - PMC - PubMed
    1. Chaguza C, Senghore M, Bojang E, Gladstone RA, Lo SW, et al. Within-host microevolution of Streptococcus pneumoniae is rapid and adaptive during natural colonisation. Nat Commun. 2020;11:3442. doi: 10.1038/s41467-020-17327-w. - DOI - PMC - PubMed

Publication types