Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Oct 15;210(8):1325-38.
doi: 10.1093/infdis/jiu260. Epub 2014 May 5.

A systematic and functional classification of Streptococcus pyogenes that serves as a new tool for molecular typing and vaccine development

Collaborators, Affiliations

A systematic and functional classification of Streptococcus pyogenes that serves as a new tool for molecular typing and vaccine development

Martina Sanderson-Smith et al. J Infect Dis. .

Abstract

Streptococcus pyogenes ranks among the main causes of mortality from bacterial infections worldwide. Currently there is no vaccine to prevent diseases such as rheumatic heart disease and invasive streptococcal infection. The streptococcal M protein that is used as the substrate for epidemiological typing is both a virulence factor and a vaccine antigen. Over 220 variants of this protein have been described, making comparisons between proteins difficult, and hindering M protein-based vaccine development. A functional classification based on 48 emm-clusters containing closely related M proteins that share binding and structural properties is proposed. The need for a paradigm shift from type-specific immunity against S. pyogenes to emm-cluster based immunity for this bacterium should be further investigated. Implementation of this emm-cluster-based system as a standard typing scheme for S. pyogenes will facilitate the design of future studies of M protein function, streptococcal virulence, epidemiological surveillance, and vaccine development.

Keywords: IgA; IgG; M protein; Streptococcus pyogenes; epidemiology; fibrinogen; molecular typing; plasminogen; vaccine.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Phylogeny of M proteins and the emm-cluster classification system. Phylogenetic inferences of M protein sequences from 175 emm-types drawn by PhyML. The tree is drawn to scale, with branch lengths in the same units (number of amino acid substitutions per site) as those of the evolutionary distances used for the phylogenetic tree. Approximate likelihood-ratio test values >80% are indicated at the nodes. The tree has 2 main clades: Clade X is composed of 6 main emm-clusters (E1–E6), whereas clade Y is divided into 2 subclades (Y1 and Y2) that are then subdivided into 10 main emm-clusters (D1 to D5 and A–C1 to A–C5). Six outlier emm-types are indicated by dashed lines (See Supplementary Data). Selective pressures analyses of M protein sequences are shown for the different emm-clusters and/or clades of the tree. The sites above the red and orange lines are positively selected (probability >0.95 and 0.5, respectively). M protein binding data to 6 human proteins are shown: dark-shaded color boxes indicate experimentally confirmed binding by M protein, white boxes indicate no binding, and light-shaded boxes represent predicted binding based on the presence of consensus binding motifs (plasminogen, IgA, IgG, and fibrinogen). Hash marks (#) indicate proteins that bind by experimental testing but lack the predicted binding motif. The cross (+) indicates the presence of the IgA binding motif in the absence of experimental binding. Findings on cross-opsonization elicited by the 30-valent vaccine [39, 40]: VA stands for vaccine antigen, black boxes indicate the presence of cross- opsonizing antibodies in rabbit, and shaded boxes indicate a lack of cross-opsonization. The emm pattern (pattern E, D, and A–C) is indicated for each emm type [9]. The asterisks (*) mark the representative M proteins expressed in E. coli. Abbreviations: IgA, immunoglobulin A; IgG, immunoglobulin G.
Figure 1.
Figure 1.
Phylogeny of M proteins and the emm-cluster classification system. Phylogenetic inferences of M protein sequences from 175 emm-types drawn by PhyML. The tree is drawn to scale, with branch lengths in the same units (number of amino acid substitutions per site) as those of the evolutionary distances used for the phylogenetic tree. Approximate likelihood-ratio test values >80% are indicated at the nodes. The tree has 2 main clades: Clade X is composed of 6 main emm-clusters (E1–E6), whereas clade Y is divided into 2 subclades (Y1 and Y2) that are then subdivided into 10 main emm-clusters (D1 to D5 and A–C1 to A–C5). Six outlier emm-types are indicated by dashed lines (See Supplementary Data). Selective pressures analyses of M protein sequences are shown for the different emm-clusters and/or clades of the tree. The sites above the red and orange lines are positively selected (probability >0.95 and 0.5, respectively). M protein binding data to 6 human proteins are shown: dark-shaded color boxes indicate experimentally confirmed binding by M protein, white boxes indicate no binding, and light-shaded boxes represent predicted binding based on the presence of consensus binding motifs (plasminogen, IgA, IgG, and fibrinogen). Hash marks (#) indicate proteins that bind by experimental testing but lack the predicted binding motif. The cross (+) indicates the presence of the IgA binding motif in the absence of experimental binding. Findings on cross-opsonization elicited by the 30-valent vaccine [39, 40]: VA stands for vaccine antigen, black boxes indicate the presence of cross- opsonizing antibodies in rabbit, and shaded boxes indicate a lack of cross-opsonization. The emm pattern (pattern E, D, and A–C) is indicated for each emm type [9]. The asterisks (*) mark the representative M proteins expressed in E. coli. Abbreviations: IgA, immunoglobulin A; IgG, immunoglobulin G.
Figure 2.
Figure 2.
Binding of plasminogen by M proteins. Single cycle kinetic SPR sensorgrams for the interaction of M proteins with plasminogen are shown (A). Human glu-plasminogen was injected over immobilized M protein (concentrations of 7.5, 15, 30, 60, and 120 nM). Binding data were calculated by nonlinear fitting of the single cycle kinetic sensograms according to a 1:1 Langmuir binding model using Biacore T200 evaluation software (Biacore AB). Only the 4 proteins from emm-cluster D4 bound plasminogen. Based on the protein sequence alignment of the 4 plasminogen-binding M proteins (B), the targeted mutagenesis data available in the literature [49, 50], and analysis of our protein data set, a refined motif for M protein plasminogen-binding was defined (C). The search for this motif among the 175 emm-types yielded positive results for all M proteins of emm-cluster D4 and the closely related M140 protein (Figure 1); all other M proteins were negative for this motif. Plasminogen binding has not been described for any M protein outside these 33 proteins. In sum, 17 and 16 of the 33 proteins contained duplicate or single binding motifs, respectively. The result of the multiple alignment of the 50 sequences containing a plasminogen binding motif is shown as a sequence logo representation (B). Abbreviation: SPR, surface plasmon resonance.
Figure 3.
Figure 3.
Binding of IgA and IgG by M proteins. In sum, 5 of 6 proteins from emm-clusters E1 and E6 bound IgA (A). Based on the protein sequence alignment of the 5 IgA-binders (B) and the data available in the literature [27], a refined motif for binding of IgA by M protein is defined (C). Motif searching gave positive results for 28 emm-types in three main (sub-)emm-clusters (E1, E6, and E4.1). M proteins of 4 other emm-types were positive for this motif: M236 (close to E6), M44 (E3), M242 (D4), and M215 (Outlier Figure 1). Findings from a multiple alignment of the 35 IgA-binding sequences (3 emm-types contain a duplicate motif) are shown as a sequence logo representation (B). All 13 recombinant M proteins from emm-cluster E1–4, E6, and A–C3 bound IgG (Figure 1), as determined by surface plasmon resonance (SPR). Single cycle kinetic sensorgrams are shown for 4 representative M proteins (D). The protein sequence alignment of 4 representative IgG binders (E) led to the definition of a motif for binding of IgG by M protein (F). Findings from a multiple alignment of the 101 IgG-binding sequences (15 emm-types contains duplicate motif) are shown as a Sequence Logo representation (E). Abbreviations: IgA, immunoglobulin A; IgG, immunoglobulin G.
Figure 4.
Figure 4.
Binding of fibrinogen by M proteins. Eight recombinant M proteins from clade Y bound fibrinogen (Figure 1) and representative single cycle kinetic SPR sensorgrams are shown for 4 emm-types (A). Based on the fibrinogen-binding motif sequence previously described for M5 [31] and the alignment of fibrinogen-binders (B) a refined fibrinogen-binding motif is proposed (C). This motif was present in 25 M proteins from clade Y but absent from M57. Findings from the multiple alignment of the 42 fibrinogen-binding sequences (9 and 4 proteins contain duplicate and triplicate motifs, respectively) are shown as a sequence logo representation (B). Abbreviation: SPR, surface plasmon resonance.
Figure 5.
Figure 5.
Correlation between immunological cross-protection and M protein sequence emm-clusters. M proteins sharing the same emm-cluster have different amino-terminal regions but possess nearly identical sequences for the rest of the protein (Figure 1); emm-cluster E6 is shown as an example (A). VA stands for vaccine antigen and indicates the M proteins of emm-cluster E6 that are included in the 30-valent vaccine [39]. The black squares show the M proteins that demonstrate cross-opsonization in rabbits following vaccination with the 30-valent vaccine [39, 40]. The average pairwise identity values of the whole M protein sequences within an emm-cluster is by definition >70% (average pairwise identity of 77.8%) (B). Multiple sequence alignments are shown for the whole M protein (C) and for the 50 amino-terminal residues only (D). Amino acid differences are highlighted by color shading and identity is represented in gray. Red boxes highlight vaccine antigens (the 50 amino-terminal residues). Pairwise identity values for the first 50 residues (average pairwise identity of 33.3%) is shown (E).
Figure 6.
Figure 6.
The emm-cluster typing system predicts the presence of J8 alleles. The presence of 11 alleles of the J8 vaccine antigen is presented for each emm-type. In total, 22 different alleles of the J8 vaccine antigen were found in our data set. The 11 alleles present in at least 5 emm-types were represented in this figure. A correlation between clades, subclades, and emm-clusters with the presence of specific J8 alleles is evident. J8, the vaccine candidate, is present in all but 13 emm-types from clade Y while absent from clade X. In contrast, J8.1 is present in 5 of the 6 emm-clusters constituting clade X; 173 of the 175 emm-types included in this study contains either J8 or J8.1 (M93, M122, and M224 do not). J8.29 and J8.8 are exclusively present in emm-cluster E2, E3, and E4. They are never present together in an emm-type and only differ by a single amino acid. J8.36 is exclusively present in emm-cluster E6, whereas a combination of J8.1–J8.12 and J8.12–J8.40 are specific for emm-cluster E1 and E5, respectively. The whole clade Y1 is characterized by a combination of J8, J8.2, and J8.4. In contrast, J8.4 is rarely found in clade Y2. J8.84 is specific of emm-clusters A–C4 and A–C5. Interestingly, emm-cluster D4 seems divided by the presence of either J8.1 or J8.57.
Figure 6.
Figure 6.
The emm-cluster typing system predicts the presence of J8 alleles. The presence of 11 alleles of the J8 vaccine antigen is presented for each emm-type. In total, 22 different alleles of the J8 vaccine antigen were found in our data set. The 11 alleles present in at least 5 emm-types were represented in this figure. A correlation between clades, subclades, and emm-clusters with the presence of specific J8 alleles is evident. J8, the vaccine candidate, is present in all but 13 emm-types from clade Y while absent from clade X. In contrast, J8.1 is present in 5 of the 6 emm-clusters constituting clade X; 173 of the 175 emm-types included in this study contains either J8 or J8.1 (M93, M122, and M224 do not). J8.29 and J8.8 are exclusively present in emm-cluster E2, E3, and E4. They are never present together in an emm-type and only differ by a single amino acid. J8.36 is exclusively present in emm-cluster E6, whereas a combination of J8.1–J8.12 and J8.12–J8.40 are specific for emm-cluster E1 and E5, respectively. The whole clade Y1 is characterized by a combination of J8, J8.2, and J8.4. In contrast, J8.4 is rarely found in clade Y2. J8.84 is specific of emm-clusters A–C4 and A–C5. Interestingly, emm-cluster D4 seems divided by the presence of either J8.1 or J8.57.

References

    1. Carapetis JR, Steer AC, Mulholland EK, Weber M. The global burden of group A streptococcal diseases. Lancet Infect Dis. 2005;5:685–94. - PubMed
    1. Steer AC, Lamagni T, Curtis N, Carapetis JR. Invasive group a streptococcal disease: epidemiology, pathogenesis and management. Drugs. 2012;72:1213–27. - PMC - PubMed
    1. Dale JB, Fischetti VA, Carapetis JR, et al. Group A streptococcal vaccines: paving a path for accelerated development. Vaccine. 2013;31(suppl 2):B216–22. - PubMed
    1. Smeesters PR, McMillan DJ, Sriprakash KS. The streptococcal M protein: a highly versatile molecule. Trends Microbiol. 2010;18:275–82. - PubMed
    1. Fischetti VA. Streptococcal M protein: molecular design and biological behavior. Clin Microbiol Rev. 1989;2:285–314. - PMC - PubMed

Publication types

MeSH terms