Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 28;16(12):e1009181.
doi: 10.1371/journal.ppat.1009181. eCollection 2020 Dec.

Phylogenomics of 8,839 Clostridioides difficile genomes reveals recombination-driven evolution and diversification of toxin A and B

Affiliations

Phylogenomics of 8,839 Clostridioides difficile genomes reveals recombination-driven evolution and diversification of toxin A and B

Michael J Mansfield et al. PLoS Pathog. .

Abstract

Clostridioides difficile is the major worldwide cause of antibiotic-associated gastrointestinal infection. A pathogenicity locus (PaLoc) encoding one or two homologous toxins, toxin A (TcdA) and toxin B (TcdB), is essential for C. difficile pathogenicity. However, toxin sequence variation poses major challenges for the development of diagnostic assays, therapeutics, and vaccines. Here, we present a comprehensive phylogenomic analysis of 8,839 C. difficile strains and their toxins including 6,492 genomes that we assembled from the NCBI short read archive. A total of 5,175 tcdA and 8,022 tcdB genes clustered into 7 (A1-A7) and 12 (B1-B12) distinct subtypes, which form the basis of a new method for toxin-based subtyping of C. difficile. We developed a haplotype coloring algorithm to visualize amino acid variation across all toxin sequences, which revealed that TcdB has diversified through extensive homologous recombination throughout its entire sequence, and formed new subtypes through distinct recombination events. In contrast, TcdA varies mainly in the number of repeats in its C-terminal repetitive region, suggesting that recombination-mediated diversification of TcdB provides a selective advantage in C. difficile evolution. The application of toxin subtyping is then validated by classifying 351 C. difficile clinical isolates from Brigham and Women's Hospital in Boston, demonstrating its clinical utility. Subtyping partitions TcdB into binary functional and antigenic groups generated by intragenic recombinations, including two distinct cell-rounding phenotypes, whether recognizing frizzled proteins as receptors, and whether it can be efficiently neutralized by monoclonal antibody bezlotoxumab, the only FDA-approved therapeutic antibody. Our analysis also identifies eight universally conserved surface patches across the TcdB structure, representing ideal targets for developing broad-spectrum therapeutics. Finally, we established an open online database (DiffBase) as a central hub for collection and classification of C. difficile toxins, which will help clinicians decide on therapeutic strategies targeting specific toxin variants, and allow researchers to monitor the ongoing evolution and diversification of C. difficile.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Clustering of TcdA and TcdB sequences derived from NCBI GenBank and SRA into subtypes.
(A) Hierarchical clustering of TcdA sequences, split into 8 groups. (B) Neighbor-joining phylogenetic tree of representative sequences of each TcdA subtype. (C) Percentage identities between representative sequences. (D) Hierarchical clustering of TcdB sequences, split into 14 groups. (E) Neighbor-joining phylogenetic tree of representative sequences of each TcdB subtype. (F) Percentage identities between representative sequences. Hierarchical clustering was performed using the hclust() function in R, and cluster definitions were selected based on strong within-cluster sequence similarities and weak between-cluster similarities, as demonstrated visually and quantitatively. The reference strains (VPI 10463 and strain 630) are associated with TcdA group A1 and TcdB group B1. The hypervirulent ribotype 027 strains such as R12087 and R20291 are associated with TcdA group A2 and TcdB group B2. Also included are the homologs of TcdA and TcdB (TcsH and TcsL, respectively) from P. sordellii, which expectedly exhibit the highest divergence from other groups. The datasets include TcdA and TcdB sequences from the NCBI GenBank as well as additional sequences assembled from the SRA.
Fig 2
Fig 2. Toxin subtypes across the C. difficile phylogeny and occurrence of subtypes in a clinical CDI cohort.
(A) TcdA (inner ring) and TcdB (outer ring) subtypes mapped onto a tree of 1934 C. difficile genomes. The tree is a maximum likelihood phylogeny of NCBI-derived C. difficile genomes based on 14,194 genome-wide SNPs (see Methods). Lineages corresponding to previously identified C. difficile PaLoc clades (1–5) are labeled numerically. Selected clinically relevant strains are shown on the tree, with hypervirulent/epidemic outbreak strains indicated by stars. Asterisks indicate lineages without toxin genes. (B) Frequency of toxin subtypes detected in 1,934 representative, complete C. difficile genomes from NCBI/GenBank. A total of 1,640 (84.8%) C. difficile strains contained TcdA and/or TcdB, while 294 (15.2%) were toxin deficient. (c) Frequency of toxin subtypes detected in a CDI clinical cohort from Brigham and Women's Hospital (BWH). The total dataset contained 351 C. difficile genomes derived from infected patients. Of these, 289 (82.3%) contained toxin genes, and 62 (17.7%) were toxin deficient.
Fig 3
Fig 3. Evolutionary diversification of TcdB by intragenic recombination and domain shuffling.
(A) Visualization of amino acid variation patterns in TcdB using a newly developed haplotype coloring algorithm (HaploColor). The visualization shows patterns of amino acid variation across the TcdB alignment. In this algorithm, the first sequence (B1.1) is assigned a distinct color, and all other sequences are colored the same color where they match this first sequence. Then, the process is repeated using a second sequence (B7.1) as the new reference, and so on. This reveals multiple colored segments indicative of common ancestry (identity by descent). Mosaic patterns are indicative of intragenic recombination. (B) Phylogenetic trees of TcdB based on individual domains. Each domain tree can be subdivided into two types (labeled 1 and 2), which allows each subtype to be described based on its domain composition (C). This reveals that TcdB subtypes are composed of domains with variable evolutionary histories, indicative of domain shuffling and intragenic recombination. (D) Evolutionary model depicting relationships between subtypes and putative recombination events. Here, TcdB split early into two main groups (i and ii). Subtype B2 likely originated by a recombination event fusing an ancestral type i and type ii toxin. B9 likely originated from recombination between B1 and B2, B3 from recombination between B1 and a type ii toxin, and B8 from recombination between B5 and a type ii toxin.
Fig 4
Fig 4. Conservation and functional variation across TcdB subtypes.
(A) Frequency of amino acid variants across all positions of TcdB. The height of the bar indicates the number of unique TcdB sequences that contain a substitution relative to the classical TcdB1 (B1.1) sequence from strain 630 and VPI10463. Below this is a plot of amino acid variation for key functional regions including the binding sites for the frizzled receptor (FZD) and the antibodies (E3, PA41, and bezlotoxumab). The alignment is colored gray for residues that match the common amino acid found in B1.1, and variants are colored blue (darkest blue = most common variant). E3 and PA41 binding sites are highly conserved, whereas FZD and bezlotoxumab binding sites are highly variable. FZD and bezlotoxumab variants also co-occur with each other. (B) Evolutionary conservation mapped to the protein structure of full length TcdB based on PDB 6OQ5 [65]. Eight highly conserved surface patches are indicated. Center residues within each surface patch are indicated in bold font.

Similar articles

Cited by

References

    1. Knight DR, Elliott B, Chang BJ, Perkins TT, Riley T V. Diversity and evolution in the genome of Clostridium difficile. Clin Microbiol Rev. 2015;28: 721–741. 10.1128/CMR.00127-14 - DOI - PMC - PubMed
    1. Guh AY, Mu Y, Winston LG, Johnston H, Olson D, Farley MM, et al. Trends in U.S. burden of Clostridioides difficile infection and outcomes. N Engl J Med. 2020;382: 1320–1330. 10.1056/NEJMoa1910215 - DOI - PMC - PubMed
    1. Heinlen L, Ballard JD. Clostridium difficile infection. Am J Med Sci. 2010;340: 247–252. 10.1097/MAJ.0b013e3181e939d8 - DOI - PMC - PubMed
    1. Rupnik M, Wilcox MH, Gerding DN. Clostridium difficile infection: New developments in epidemiology and pathogenesis. Nat Rev Microbiol. 2009;7: 526–536. 10.1038/nrmicro2164 - DOI - PubMed
    1. Martin JSH, Monaghan TM, Wilcox MH. Clostridium difficile infection: Epidemiology, diagnosis and understanding transmission. Nat Rev Gastroenterol Hepatol. 2016;13: 206–216. 10.1038/nrgastro.2016.25 - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources