Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jul 22;6(7):e177.
doi: 10.1371/journal.pbio.0060177.

Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation

Affiliations

Population genomic analysis of strain variation in Leptospirillum group II bacteria involved in acid mine drainage formation

Sheri L Simmons et al. PLoS Biol. .

Abstract

Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth approximately 20x). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types ( approximately 94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The Genome of Leptospirillum group II, Showing That SNP Density and Frequency Are Roughly Symmetrical around the Origin of Replication
Circular diagram showing contig ordering, SNP density, minor allele frequency, and location of strains around the genome. The outer ring shows all contigs in the 5-way CG assembly, ordered by mate pairing and by reference to the UBA genome. Four locations where the join is uncertain are circled in magenta. The first inner ring shows a moving average of SNP density (1-kb windows, 50-bp slide, scale 0–1). Dark red indicates local SNP density of greater than 0.5% while pink indicates less than 0.5%. The second inner ring shows a moving average of minor allele frequency (scale 0%–0.7%). Dark-green points indicate average minor allele frequencies for a window greater than 0.05%. Light-blue highlights indicate location of substrains used to analyze variation within the 5-way CG population (>99% sequence similarity). Purple highlights indicate the location of deeply sampled reads of more divergent Leptospirillum group II strains incorporated into the population (∼94% sequence similarity). The image was generated with Circos (M. Krzywinski, http://mkweb.bcgsc.ca/circos/).
Figure 2
Figure 2. Overview of Different Source of Pangenomic Variation over a 500-kb Segment of the 5-Way CG Type Leptospirillum Group II Genome, Including the Origin of Replication
Only the mapped segment of the genome is shown. In the outer ring, tRNAs are indicated with orange, transposons with red, and integrases with “Int.” The location and length of strain variant paths (see main text) are shown in green in the first inner ring, and the locations of recombinant reads (blue = UBA-type, and red = non–UBA-type) are shown in the second inner ring. The innermost ring shows nonsynonymous SNPs in blue, synonymous SNPs in purple, intergenic SNPs in red, and SNPs resulting in frameshifts in orange. The image was generated with Circos (M. Krzywinski, http://mkweb.bcgsc.ca/circos/).
Figure 3
Figure 3. Diagram Showing Gene Content Variation Due to Integrase Insertion in a Region of Contig 11277
(A) The gene content of the variant regions are shown. One of these contains an insertion of an integrase (red) and associated genes, mostly hypotheticals or conserved hypotheticals (orange on the top strand, green on the bottom strand). Genes on opposite strands are shown in dark purple (top) or light purple (bottom). Gene annotations are given in text, or by numbers, as follows: (1) putative GTP binding protein; (2) hypothetical protein; (3) protein of unknown function; (4–5) putative peptidase M16; (6) hypothetical protein; (7) putative Na+/H+ antiporters; (8) putative metabolite transport protein; (9) leucyl-tRNA (Cons hyp); (10) protein of unknown function; (11) probable polymerase III; delta subunit; (12) ribosomal protein S20; (13) putative virulence factor, MVIN-like; (14) probable HNH endonuclease; (15) protein of unknown function; (16) hypothetical protein; (17) hypothetical protein; and (18) probable ATPase, PP-loop superfamily. (B) The associated reads are shown. Dark blue regions show inserted sequence divergent from the composite.
Figure 4
Figure 4. A High-Frequency Variant Region with Six Alternate Paths, One of Which Contains a Complete LuxIR Pathway
The main genome path is shown on the top line with alternative paths below. Light blue indicates genes present on the main genome path, dark blue indicates genes shared between two of the variants, yellow indicates hypothetical proteins, black indicates transposases, red indicates phage integrases, and pink (top strand) and purple (bottom strand) indicate genes potentially involved in the LuxIR pathway. Hypothetical proteins in the LuxIR region are shown in grey. Genes are annotated as follows: (1) l-aspartate oxidase; (2) probable ferredoxin; (3) conserved hypothetical protein; (4) citrate synthase; (5) aconitate hydratase (same as aconitase); (6) succinyl-CoA synthetase, alpha subunit; (7) succinyl-CoA synthetase, beta subunit; (8) pyoverdine chromophore precursor synthetase; (9) hypothetical protein; (10) acetolactate synthase, large subunit; (11) acetolactate synthase, small subunit; (12) hypothetical protein; (13) conserved hypothetical protein; (14) biotin synthesis; (15) DNA binding protein; (16) hypothetical protein; (17) protein of unknown function; (18) hypothetical protein; (19) acetylornithine aminotransferase; (20) phage integrase; (21–22) hypothetical protein; (23) Lux R; (24) transposase; (25) Lux R; (26) Lux I; (26′) short Lux I; (27) cytochrome P450 family protein; (28) hypothetical protein; (28′) truncated hypothetical protein; (29) transposase; (30) transposase; (31) hypothetical protein; (32) diguanylate cyclase; (32′) diguanylate cyclase frame shifted; (33) putative protein tyrosine phosphatase; (34) UDP-glucose 4-epimerase; (35) hypothetical protein; (36) UDP 6-dehydrogenase; (37) glucosamine-fructose-6-phosphate aminotransferase; (38) putative sigma54-specific transcriptional regulator, Fis family; (39) hypothetical protein; and (40) putative sigma54-specific transcriptional regulator, Fis family.
Figure 5
Figure 5. Schematic Illustrating How Strains Were Separated for Population Genetic Analyses
Left: screenshot of contig 11111 (located near the origin of replication) from Strainer. Individual reads are shown as white blocks. Strains defined by shared polymorphisms are shown in distinct colors, with the main strain in orange. The vertical dashed lines indicate regions within the main strain not overlapped by any substrain and referred to as “intersubstrain regions” in the main text. Right: schematic illustrating the identification of polymorphisms within and between strains. The orange box surrounds a site classified as a fixed difference between the main strain and a substrain. The green box surrounds a site classified as polymorphic (three Gs, one A in the main strain; two Gs in the substrain).
Figure 6
Figure 6. Recombination between Related Individuals Is Directly Observed in Some Sequence Reads
(A) Diagram showing an area in contig 11277 with possible evidence for recombination between closely related strain variant types. Reads whose sequence type is a hybrid of a variant and the dominant strain types are outlined in red. The reads corresponding to the consensus (majority) sequence type are not shown. (B) Area of contig 11277 at approximately 140 kb where highly divergent sequence relative to the composite has inserted into three variants: the dominant strain variant (contig consensus sequence), as indicated by the upper read outlined in red, the brown strain (note that one individual sampled lacks the insert [lower read outlined in red]), and the blue strain. In the region identified by the yellow box, the contig 11277 consensus sequence appears to be a distinct Leptospirillum variant (dissimilar to both 5-way CG and UBA types) and the variant sequence (brown strain) is likely the pre-recombination 5-way CG-type sequence. Identity between contig 11277 and the sequence terminating the brown and blue strains is 72%. Recombinant reads are highlighted in red.
Figure 7
Figure 7. Extensive Strain Variation around the Origin of Replication Indicates Multiple Recombination Events
(A) Strain variation captured within contig 11389. The white shading indicates the position of the ATPase involved in DNA replication. The composite sequence corresponds to the sequence of the olive-green strain type. Dark-blue regions on the reads indicate bases that disagree with the composite, either due to sequencing error (especially at read ends), insertions, or SNPs. Results illustrate the existence of three sequence types in this region (also see [B]). Although the brown and dark-green strain groups are highly divergent relative to the composite sequence over a region that begins shortly before the origin, it important to note that it is the dominant sequence type that becomes identical to the UBA Leptospirillum group II genome type due to a recombination event. (B) At higher magnification, it is evident that there were two 5-way CG strains in the population, only one of which was involved in the recombination event. The brown and dark-green strain sequence types terminate when they become too divergent to be coassembled into 11389. Most mate pairs missing from reads from the brown and dark dark-green strains at the base of the figure place at the start of scaffold 11386. (C) Diagram illustrating the two sequence variants present in the 5-way CG population, reconstructed at 11389 (top) and 11386 (bottom). Beyond this point, divergent small phage-like regions are followed by a 25-kb region in which all cells have the UBA Leptospirillum group II genome type. Note that the recombination block carries Cas proteins and the CRISPR locus (not shown in detail). Blue indicates genes shared between the two sequence variants, green indicates genes present only in the 11389 variant, orange indicates genes present only in the 11386 variant, and red indicates the cas proteins (100% identical between variants). Genes are annotated as follows: (1) chromosomal replication initiator protein (RepA); (2) DNA polymerase III, beta chain; (3) DNA gyrase, B subunit; (4) DNA gyrase, A subunit; (5) protein of unknown function; (6) putative radical activating enzyme; (7) ExsB protein; (8) leucyl aminopeptidase; (9) SsrA-binding protein; (10) putative integrase; (11–15) hypothetical protein; (16–17); putative Type III restriction/modification system, M and R subunits; (18–19) hypothetical protein; (20) hypothetical protein; (21) phage DNA binding protein; (22) conserved protein of unknown function; (23) DNA methyltransferase/helicase; (24–25) hypothetical protein; (26–27) probable transposase; (28) Cas3; (29) Cas1; (30) Cas2; (31) Cas4; (32) Cas5; (33) Cas3; (34) Cas1; (35) Cas2; and (36) CRISPR locus.

References

    1. Miller MC, Keymer DP, Avelar A, Boehm AB, Schoolnik GK. Detection and transformation of genome segments that differ within a coastal population of Vibrio cholerae strains. Appl Environ Microbiol. 2007;73:3695–3704. - PMC - PubMed
    1. Kettler GC, Martiny AC, Huang K, Zucker J, Coleman ML, et al. Patterns and implications of gene gain and loss in the evolution of Prochlorococcus . PLoS Genet. 2007;3:e231. doi: 10.1371/journal.pgen.0030231. - DOI - PMC - PubMed
    1. Coleman ML, Sullivan MB, Martiny AC, Steglich C, Barry K, et al. Genomic islands and the ecology and evolution of Prochlorococcus . Science. 2006;311:1768–1770. - PubMed
    1. Wilhelm LJ, Tripp HJ, Givan SA, Smith DP, Giovannoni SJ. Natural variation in SARII marine bacterioplankton genomes inferred from metagenomic data. Biol Direct. 2007;2:27. - PMC - PubMed
    1. Cohan F. Towards a conceptual and operational union of bacterial systematics, ecology, and evolution. Philos Trans R Soc Lond B Biol Sci. 2006;361:1985–1996. - PMC - PubMed

Publication types

LinkOut - more resources