Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 19;202(24):e00086-20.
doi: 10.1128/JB.00086-20. Print 2020 Nov 19.

A Vibrio cholerae Core Genome Multilocus Sequence Typing Scheme To Facilitate the Epidemiological Study of Cholera

Affiliations

A Vibrio cholerae Core Genome Multilocus Sequence Typing Scheme To Facilitate the Epidemiological Study of Cholera

Kevin Y H Liang et al. J Bacteriol. .

Abstract

Core genome multilocus sequence typing (cgMLST) has gained popularity in recent years in epidemiological research and subspecies-level classification. cgMLST retains the intuitive nature of traditional MLST but offers much greater resolution by utilizing significantly larger portions of the genome. Here, we introduce a cgMLST scheme for Vibrio cholerae, a bacterium abundant in marine and freshwater environments and the etiologic agent of cholera. A set of 2,443 core genes ubiquitous in V. cholerae were used to analyze a comprehensive data set of 1,262 clinical and environmental strains collected from 52 countries, including 65 newly sequenced genomes in this study. We established a sublineage threshold based on 133 allelic differences that creates clusters nearly identical to traditional MLST types, providing backwards compatibility to new cgMLST classifications. We also defined an outbreak threshold based on seven allelic differences that is capable of identifying strains from the same outbreak and closely related isolates that could give clues on outbreak origin. Using cgMLST, we confirmed the South Asian origin of modern epidemics and identified clustering affinity among sublineages of environmental isolates from the same geographic origin. Advantages of this method are highlighted by direct comparison with existing classification methods, such as MLST and single-nucleotide polymorphism-based methods. cgMLST outperforms all existing methods in terms of resolution, standardization, and ease of use. We anticipate this scheme will serve as a basis for a universally applicable and standardized classification system for V. cholerae research and epidemiological surveillance in the future. This cgMLST scheme is publicly available on PubMLST (https://pubmlst.org/vcholerae/).IMPORTANCE Toxigenic Vibrio cholerae isolates of the O1 and O139 serogroups are the causative agents of cholera, an acute diarrheal disease that plagued the world for centuries, if not millennia. Here, we introduce a core genome multilocus sequence typing scheme for V. cholerae Using this scheme, we have standardized the definition for subspecies-level classification, facilitating global collaboration in the surveillance of V. cholerae In addition, this typing scheme allows for quick identification of outbreak-related isolates that can guide subsequent analyses, serving as an important first step in epidemiological research. This scheme is also easily scalable to analyze thousands of isolates at various levels of resolution, making it an invaluable tool for large-scale ecological and evolutionary analyses.

Keywords: Vibrio cholerae; cgMLST; cholera; core genome; epidemiological surveillance; gene-by-gene approach; multilocus sequence typing; whole-genome sequencing.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Pairwise allelic differences for all isolates used in this study. Both plots show the frequency of allelic mismatches in pairwise comparisons. (A) Pairwise comparisons of up to 2,443 allelic differences are shown. Major peaks are shaded. (B) Comparisons with up to 500 allelic differences are shown. Pairwise comparisons of only clinical isolates are shown in red. Vertical lines indicate the outbreak threshold (red) and sublineage threshold (blue).
FIG 2
FIG 2
Plot showing the Dunn index for clustering thresholds, ranging from 1 to 1,000 allelic differences. Each clustering threshold is bootstrapped 100 times. The median, plotted with the light blue shade, indicates the 25th to 75th percentile range. Red and blue lines indicate the outbreak and sublineage thresholds, respectively. The dotted lines represent other clustering thresholds used in the adjusted Rand index calculations (Fig. 3B and Fig. S3).
FIG 3
FIG 3
Evaluation of network similarities between the cgMLST sublineage threshold and MLST ST. (A) Networks of all sublineages identified using only V. cholerae isolates from Bangladesh (n = 255). Each cluster represents a sublineage and includes isolates with less than or equal to 133 allelic differences with each other. Each node represents a cgST and is colored by ST based on the 2016 MLST scheme (21). Sizes of the nodes are proportional to the number of isolates. The length of the connecting lines within a cluster is proportional to the number of allelic differences. (B) Adjusted Rand index for individual pairwise comparisons between predefined clustering thresholds (Fig. 2) and the 2016 MLST scheme (21). The sublineage clustering threshold (i.e., 133 allelic differences) and outbreak threshold (i.e., 7 allelic differences) are indicated in blue and red bars, respectively.
FIG 4
FIG 4
Phylogenetic tree of 1,146 V. cholerae isolates (excluding the 116 isolates from the Yemen cholera outbreak) reconstructed using Parsnp v1.2 (97). All groups inside the PG lineage (7th pandemic El Tor, El Tor progenitor, El Tor sister, classical, and classical sister) are collapsed. Outer rings represent clustering by sequence type based on the 2016 MLST scheme by Kirchberger and colleagues (21), whereas the inner ring represents clustering based on the sublineage threshold (i.e., 133 allelic differences). Branches of clinical strains are colored in red. The phylogenetic tree is rooted with a basal lineage to V. cholerae (collapsed) (79, 86).
FIG 5
FIG 5
Minimum spanning trees isolated when the outbreak threshold was applied to the complete data set of 1,262 isolates. (A) All isolates that clustered together with the Haiti and Yemen isolates based on the clustering threshold of seven allelic differences. (B) All isolates that clustered with the Mozambique isolates based on the clustering threshold of seven allelic differences. Additional Mozambique isolates that are not part of the same outbreak cluster are also shown. Three isolates, two from Zimbabwe and one from the United States, are connected, as they share seven or fewer allelic differences with the Mozambique isolates. In both panels, the size of the nodes is proportional to the number of isolates. The length of the lines is proportional to the number of allelic differences, and all connections have fewer than or equal to seven allelic differences.
FIG 6
FIG 6
cgMLST MST of all Yemen isolates and representative 7th pandemic El Tor strains. All isolates connected by dotted lines share eight or more allelic differences (not drawn to scale). All isolates connected with solid lines share seven or fewer allelic differences (i.e., they belong to the same outbreak cluster; drawn to scale). Each node represents a cgST that is colored by year of collection. The outbreak clusters are shaded by country.
FIG 7
FIG 7
Comparison between cgMLST and MLVA with a focus on the Mozambique isolates. (A) Population structure of pandemic V. cholerae in Mozambique based on MLVA profiles by Garrine and colleagues (23). MST of the Mozambique isolates is based on the cgMLST scheme colored based on MLVA profiles (B) and year of isolation (C). All isolates in panels B and C connected with lines share seven or fewer allelic differences. For all panels, the size of the nodes is proportional to the number of isolates. The length of the lines is proportional to the number of allelic differences.
FIG 8
FIG 8
Comparison between cgMLST and SNP-based analysis with a focus on the Haiti outbreak and related isolates. (A) MST of isolates from the 2010 cholera outbreak in Haiti. All lines indicate connections of four or fewer allelic differences. Each node represents a cgST, which is colored by year of isolation. Background shading represents ST designations based on 45 high-quality SNPs by Katz and colleagues (14). Note that cgST66 contains a mix of colors, as it contains both ST1 and ST3. Any isolate from countries other than Haiti is indicated. The length of the lines is proportional to the number of allelic differences. (B) MST constructed from whole-genome SNP data (14). The length of the lines indicates the number of nucleotide substitutions. The size of the nodes is proportional to the number of isolates.
FIG 9
FIG 9
Sublineage clusters of nonclinical environmental isolates that are not part of the PG lineage. Clusters are constructed using NetworkX (100) and visualized with Cytoscape (101). Missing loci were assumed to contain the most common allele when calculating allelic differences. Isolates are connected only if they share 133 allelic differences or fewer with each other. Each node represents an isolate and is colored by the country of origin.

References

    1. Jahan S. 2016. Cholera—epidemiology, prevention and control, p 145–157. In Makun HA. (ed), Significance, prevention and control of food related diseases. InTechOpen, Rijeka, Croatia.
    1. Momba M, Azab El-Liethy M. 2017. Vibrio cholerae and cholera biotypes In Pruden A, Ashbolt N, Miller J (ed), Global water pathogen project. Michigan State University, Lansing, Michigan.
    1. Clemens JD, Nair GB, Ahmed T, Qadri F, Holmgren J. 2017. Cholera. Lancet 390:1539–1549. doi: 10.1016/S0140-6736(17)30559-7. - DOI - PubMed
    1. Kaper JB, Morris JG, Levine MM. 1995. Cholera. Clin Microbiol Rev 8:48–86. doi: 10.1128/CMR.8.1.48. - DOI - PMC - PubMed
    1. Islam MT, Alam M, Boucher Y. 2017. Emergence, ecology and dispersal of the pandemic generating Vibrio cholerae lineage. Int Microbiol 20:106–115. doi: 10.2436/20.1501.01.291. - DOI - PubMed

Publication types