Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 2;39(7):msac135.
doi: 10.1093/molbev/msac135.

A Dual Barcoding Approach to Bacterial Strain Nomenclature: Genomic Taxonomy of Klebsiella pneumoniae Strains

Affiliations

A Dual Barcoding Approach to Bacterial Strain Nomenclature: Genomic Taxonomy of Klebsiella pneumoniae Strains

Melanie Hennart et al. Mol Biol Evol. .

Abstract

Sublineages (SLs) within microbial species can differ widely in their ecology and pathogenicity, and their precise definition is important in basic research and for industrial or public health applications. Widely accepted strategies to define SLs are currently missing, which confuses communication in population biology and epidemiological surveillance. Here, we propose a broadly applicable genomic classification and nomenclature approach for bacterial strains, using the prominent public health threat Klebsiella pneumoniae as a model. Based on a 629-gene core genome multilocus sequence typing (cgMLST) scheme, we devised a dual barcoding system that combines multilevel single linkage (MLSL) clustering and life identification numbers (LINs). Phylogenetic and clustering analyses of >7,000 genome sequences captured population structure discontinuities, which were used to guide the definition of 10 infraspecific genetic dissimilarity thresholds. The widely used 7-gene multilocus sequence typing (MLST) nomenclature was mapped onto MLSL SLs (threshold: 190 allelic mismatches) and clonal group (threshold: 43) identifiers for backwards nomenclature compatibility. The taxonomy is publicly accessible through a community-curated platform (https://bigsdb.pasteur.fr/klebsiella), which also enables external users' genomic sequences identification. The proposed strain taxonomy combines two phylogenetically informative barcode systems that provide full stability (LIN codes) and nomenclatural continuity with previous nomenclature (MLSL). This species-specific dual barcoding strategy for the genomic taxonomy of microbial strains is broadly applicable and should contribute to unify global and cross-sector collaborative knowledge on the emergence and microevolution of bacterial pathogens.

Keywords: genomic classification; genomic library; international harmonization; microevolution; pathogen tracking; strain nomenclature.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Genome-based phylogenetic tree of the Klebsiella pneumoniae species complex. The whole-genome distance-based tree was inferred using JolyTree. JolyTree uses mash to decompose each genome into a sketch of k-mers and to quickly estimate the p-distance between each pair of genomes; after transforming every p-distance into a pairwise evolutionary distance, a phylogenetic tree is inferred using FastME. The seven phylogroups are indicated. Red dots correspond to strains defined as interphylogroup hybrids. Scale bar, 0.01 nucleotide substitutions per site.
Fig. 2.
Fig. 2.
Phylogenetic structure within phylogroup Kp1 (K. pneumoniae sensu stricto). The circular tree was obtained using IQ-TREE based on the concatenation of the genes of the scgMLSTv2 scheme; 1,600 isolates are included (see Materials and Methods). Labels on the external first circle represent 7-gene MLST ST identifiers (each alternation corresponds to a different ST and only ST with more than 20 strains are labeled). The second and third circles (light green and blue, respectively) show the alternation of CGs and SLs, respectively, labeling only groups with more than 20 isolates. Full correspondence between ST, SL, and CG identifiers is given in the supplementary appendix, Supplementary Material online.
Fig. 3.
Fig. 3.
Distribution of pairwise cgMLST distances, clustering properties, and phylogenetic congruence.Values are plotted for the 7,060 genomesdata set. Threshold values (t) are shown on the X-axis, corresponding to allelic profilemismatch values up to 629 (or 100%). Grey histograms: distribution of pairwise allelic mismatches. The circles correspond to thedifferent modes of distribution. The curves represent the consistency coefficient St (silhouette, blue) and stability coefficient Wt (green),respectively, obtained with each threshold t; the corresponding scale is on the left Y-axis. To identify the two curves without referenceto their colour, note that the St curve starts (at X = 0) approximately at 0.65 and the Wt curve starts at approx. 0.95. The dotted verticalred lines at t = 43/629, 190/629, 585/629, and 610/629 represent the thresholds up to which pairs of genomes belongto the same CGs,SLs, phylogroups, and species, respectively.
Fig. 4.
Fig. 4.
Concordance of SL, clonal group and 7-gene MLST classifications (Panel A: ST258 and related genomes; Panel B: ST23 and related genomes). Alluvial diagram obtained using RAWGraphs (Mauri et al. 2017) showing the correspondence between STs (7 genes identity), CGs (43 allelic mismatches threshold), and SLs (190 allelic mismatches threshold). Colors are arbitrarily attributed by the software for readability.
Fig. 5.
Fig. 5.
Phylogenetic relationships are reflected in cgLIN code prefixes. Left: the prefix tree generated from cgLIN codes; Right: phylogenetic relationships derived using IQ-TREE from the cgMLST gene sequences from the reference strains. The cgLIN codes are also shown. The values indicated on top of the prefix tree correspond to the cgMLST similarity percentage of the corresponding cgLIN code bin.
Fig. 6.
Fig. 6.
cgLIN code prefixes, and virulence and antimicrobial resistance scores of some SLs and their CGs. Left (green) four first columns: LIN prefixes of selected SLs and CGs. Right: heatmaps of virulence and resistance scores of CGs, and the number of genomes in each group. For each genome, the virulence score derived from Kleborate has a value from 0 to 5; the value in the cells corresponds to the percentage of strains in the group with that virulence score (similar to a heat map). The principle is the same for the resistance score, but it varies from 0 to 3.

References

    1. Achtman M, Wain J, Weill F-X, Nair S, Zhou Z, Sangal V, Krauland MG, Hale JL, Harbottle H, Uesbeck A, et al. 2012. Multilocus sequence typing as a replacement for serotyping in Salmonella enterica. PLoS Pathog. 8(6):e1002776. 10.1371/journal.ppat.1002776 - DOI - PMC - PubMed
    1. Barker DOR, Carriço JA, Kruczkiewicz P, Palma F, Rossi M, Taboada EN. 2018. Rapid identification of stable clusters in bacterial populations using the adjusted Wallace coefficient. BioRxiv. April 299347. 10.1101/299347 - DOI
    1. Bialek-Davenet S, Criscuolo A, Ailloud F, Passet V, Jones L, Delannoy-Vieillard AS, Garin B, Le Hello S, Arlet G, Nicolas-Chanoine M-H, et al. 2014. Genomic definition of hypervirulent and multidrug-resistant Klebsiella pneumoniae clonal groups. Emerg Infect Dis. 20(11):1812–1820. 10.3201/eid2011.140206 - DOI - PMC - PubMed
    1. Blin C, Passet V, Touchon M, Rocha EPC, Brisse S. 2017. Metabolic diversity of the emerging pathogenic lineages of Klebsiella pneumoniae. Environ Microbiol. 19(5):1881–1898. 10.1111/1462-2920.13689 - DOI - PubMed
    1. Bowers JR, Kitchel B, Driebe EM, MacCannell DR, Roe C, Lemmer D, de Man T, Rasheed JK, Engelthaler DM, Keim P, et al. 2015. Genomic analysis of the emergence and rapid global dissemination of the clonal group 258 Klebsiella pneumoniae pandemic. PLoS One 10(7):e0133727. 10.1371/journal.pone.0133727 - DOI - PMC - PubMed

Publication types