Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec;193(23):6651-63.
doi: 10.1128/JB.05263-11. Epub 2011 Sep 23.

Whole-genome association study on tissue tropism phenotypes in group A Streptococcus

Affiliations

Whole-genome association study on tissue tropism phenotypes in group A Streptococcus

Debra E Bessen et al. J Bacteriol. 2011 Dec.

Abstract

Group A Streptococcus (GAS) has a rich evolutionary history of horizontal transfer among its core genes. Yet, despite extensive genetic mixing, GAS strains have discrete ecological phenotypes. To further our understanding of the molecular basis for ecological phenotypes, comparative genomic hybridization of a set of 97 diverse strains to a GAS pangenome microarray was undertaken, and the association of accessory genes with emm genotypes that define tissue tropisms for infection was determined. Of the 22 nonprophage accessory gene regions (AGRs) identified, only 3 account for all statistically significant linkage disequilibrium among strains having the genotypic biomarkers for throat versus skin infection specialists. Networked evolution and population structure analyses of loci representing each of the AGRs reveal that most strains with the skin specialist and generalist biomarkers form discrete clusters, whereas strains with the throat specialist biomarker are highly diverse. To identify coinherited and coselected accessory genes, the strength of genetic associations was determined for all possible pairwise combinations of accessory genes among the 97 GAS strains. Accessory genes showing very strong associations provide the basis for an evolutionary model, which reveals that a major transition between many throat and skin specialist haplotypes correlates with the gain or loss of genes encoding fibronectin-binding proteins. This study employs a novel synthesis of tools to help delineate the major genetic changes associated with key adaptive shifts in an extensively recombined bacterial species.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Distribution of Alab49-derived genes among 96 GAS strains. The circular map summarizes CGH findings for Alab49-derived targets. The tick marks correspond to hybridization signal ratios of <2.8 (i.e., absent from test strain); each concentric circle represents the signal ratio data for an individual GAS strain (red, emm pattern E genotype; blue, emm pattern A-C genotype; green, emm pattern D genotype). Gray tick marks indicate poor-quality signal data. Also shown are the Alab49 map positions of prophage and accessory genes that are present in 15 to 85% of the 97 GAS strains tested.
Fig. 2.
Fig. 2.
Linear genome map of accessory gene regions of GAS. The core GAS genome constitutes ∼1.57 Mb, and the relative position of AGRs is indicated. AGRs present in strain Alab49 are in yellow (above bar); also shown are the positions of four prophage or prophage-like elements of Alab49. AGRs absent from Alab49 are in blue (below bar). Some GAS strains (e.g., SSI-1, Manfredo) have chromosomal inversions; for the Manfredo strain relative to Alab49, the two crossover points map to comX-like regions at positions lying between AGR-5 and -6 and between AGR-19 and -20. Because the AGRs are defined by genes that are present in 15 to 85% of the set of 97 GAS strains, genetic elements that are associated with only a few strains are excluded [e.g., ICE-like element harboring tet(O)]. In addition, since the GAS pangenome microarray is based on strains whose whole-genome sequence has been determined, genes unique to other strains are excluded, although many of those genes are probably rare among the set of 97 strains [e.g., mef(A)]. Inter-AGR distance measurements are based on conserved core genes and exclude prophage. The eight AGRs displaying very high levels of association are marked (*) (Fig. 5; also see Table S6 in the supplemental material). LDH, lactate dehydrogenase; ICE, integrative and conjugative element; ERES region, eno ralp3 epf sagA region; GRAB, protein G-related alpha-2-macroglobulin-binding protein.
Fig. 3.
Fig. 3.
Population structure of GAS based on AGR content. STRUCTURE (version 2.3.3) was used to assess GAS population structure. It assumes there are K populations, each defined by a set of character state (presence/absence) frequencies at 21 loci. The loci (listed in Table S4 in the supplemental material) are representatives of each of 21 AGRs; AGR-21/21X is not included so that analysis is independent of emm pattern. (A) The y axis depicts the 97 GAS strains, according to emm pattern, as indicated; strain order is the same as that presented in Table S5. The x axis depicts the estimated membership fraction in each of the K inferred clusters (i.e., populations) for individual GAS strains; colors are assigned randomly to each cluster and are unrelated to those used to denote emm pattern in other figures. For K = 2 (whereby two populations are assumed), strains with admixed populations having frequencies ranging from 0.3 to 0.7 are indicated by emm type; also marked are pattern D-like strains in cluster 1, and pattern E-like strains in cluster 2. (B) The mean average frequency for each cluster (i.e., population) is plotted according to emm pattern group, over a range of K values. Parameter settings for both panels are as follows: Markov chain Monte Carlo (MCMC) 10,000 iterations plus 10,000 burnin, haploid data, admixture model, and allele frequencies independent among populations (λ = 1). Estimates of ln probability of data are −1,082.9 (K = 2), −1,040 (K = 3), −1,032.6 (K = 4), −1,012.4 (K = 5), and −1,071.5 (K = 10).
Fig. 4.
Fig. 4.
Networked evolution of GAS. SplitsTrees graphs are based on 97 GAS strains for panels A and B, with 21 binary characters denoting the presence or absence of genes representative of AGRs (see Table S4 in the supplemental material), or, for panel C, concatenates of seven core housekeeping gene nucleotide sequences (3,134 characters). For panel C, parsimony-uninformative characters are excluded. Graphs employ uncorrected P distance, the equal angle method for splits transformation (no weights), and the neighbor net network (panels A and C) or the minimum spanning network (panel B) method. Each terminal node represents a GAS strain (taxon), depicted in accordance with emm pattern as follows: pattern A-C (blue), pattern D (green), and pattern E (red). The M/emm types of the GAS strains whose genome sequences are known are indicated. In panel A, the M/emm type of outlier strains of the emm pattern D or E genotypes, whereby an emm pattern D strain is pattern E-like and vice versa, is shown in italics. A phylogenetic tree constructed by the maximum parsimony method had weak bootstrap support and high homoplasy (data not shown).
Fig. 5.
Fig. 5.
Positive and negative linkage between genes of different AGRs. The lowest BH-corrected P value (Fisher's exact test, two-tailed) for any gene pair assigned to each possible combination of AGRs is as follows: P < 5.00E−02 (purple), P < 1.00E−03 (dark blue), P < 1.00E−04 (bright blue), P < 1.00E−05 (green), and P < 1.00E−07 (yellow). The predominant direction of linkage—positive (P) or negative (N)—is indicated for BH-corrected P values of <1.00E−05.
Fig. 6.
Fig. 6.
Model of evolution for GAS based on highly linked AGRs. The locus within each AGR having the lowest P value, when compared to any gene of another AGR, is used to define character states (0, absence; 1, presence) for AGRs 2X, 3, 4, 13, 14, 16B, 17A, and 21X (see Table S7 in the supplemental material); haplotypes are indicated by “H” and are associated with ≥1 strain. The area of the circles denoting each haplotype reflects the number of assigned GAS strains; emm pattern genotypes are represented by blue (pattern A-C), green (pattern D) and red (pattern E), and their fractional content is displayed. The 14 core haplotypes associated with >1 of the 97 GAS strains are indicated (bold black font). Lines connect all haplotype pairs that differ by a single locus; the set of thicker lines connect two core haplotypes each represented by >1 of the 97 GAS strains. Single locus differences among haplotypes were established using the eBURST clustering algorithm (version 3; www.mlst.net) and redrawn. H24 provides a link between the two major subclusters (I, upper; II, lower). Not shown are connections between haplotypes that differ from the main cluster by ≥2 loci (H4, H5, H6, H8, H9, H12, H21, H30; n = 8).
Fig. 7.
Fig. 7.
Model of evolution showing gain and loss of fibronectin-binding protein genes. Detail of upper subcluster I presented in Fig. 6, showing an extended haplotype that incorporates the character state of Fn-binding protein loci within AGR-2/2X and AGR-21/21X (see Tables S6 and S7 in the supplemental material); gain/loss of individual Fn-binding protein genes is indicated. Haplotypes positive for prtF1 are boxed, and haplotypes positive for fbaA are underlined. The area of the circles denoting each haplotype reflects the number of assigned GAS strains; emm pattern genotypes are represented by blue (pattern A-C), green (pattern D), and red (pattern E), and their fractional content is displayed. Lines connect all haplotype pairs that differ by a single locus; the set of thicker lines connect two core haplotypes each represented by >1 of the 97 GAS strains (see Fig. 6 legend for further details).

Similar articles

Cited by

References

    1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–410 - PubMed
    1. Anthony B. F., Kaplan E. L., Wannamaker L. W., Chapman S. S. 1976. The dynamics of streptococcal infections in a defined population of children: serotypes associated with skin and respiratory infections. Am. J. Epidemiol. 104:652–666 - PubMed
    1. Banks D. J., Porcella S. F., Barbian K. D., Martin J. M., Musser J. M. 2003. Structure and distribution of an unusual chimeric genetic element encoding macrolide resistance in phylogenetically diverse clones of group A Streptococcus. J. Infect. Dis. 188:1898–1908 - PubMed
    1. Beall B. 16 May 2011, posting date Streptococcus pyogenes emm sequence database. http://cdc.gov/ncidod/biotech/strep/strepblast.htm
    1. Bendtsen J. D., Nielsen H., von Heijne G., Brunak S. 2004. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340:783–795 - PubMed

Publication types

Substances

Associated data