Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 3:148:e172.
doi: 10.1017/S0950268820001697.

Evaluation of an ensemble-based distance statistic for clustering MLST datasets using epidemiologically defined clusters of cyclosporiasis

Affiliations

Evaluation of an ensemble-based distance statistic for clustering MLST datasets using epidemiologically defined clusters of cyclosporiasis

Fernanda S Nascimento et al. Epidemiol Infect. .

Abstract

Outbreaks of cyclosporiasis, a food-borne illness caused by the coccidian parasite Cyclospora cayetanensis have increased in the USA in recent years, with approximately 2300 laboratory-confirmed cases reported in 2018. Genotyping tools are needed to inform epidemiological investigations, yet genotyping Cyclospora has proven challenging due to its sexual reproductive cycle which produces complex infections characterized by high genetic heterogeneity. We used targeted amplicon deep sequencing and a recently described ensemble-based distance statistic that accommodates heterogeneous (mixed) genotypes and specimens with partial genotyping data, to genotype and cluster 648 C. cayetanensis samples submitted to CDC in 2018. The performance of the ensemble was assessed by comparing ensemble-identified genetic clusters to analogous clusters identified independently based on common food exposures. Using these epidemiologic clusters as a gold standard, the ensemble facilitated genetic clustering with 93.8% sensitivity and 99.7% specificity. Hence, we anticipate that this procedure will greatly complement epidemiologic investigations of cyclosporiasis.

Keywords: Cyclospora cayetanensis; MLST; clustering; cyclosporiasis; deep sequencing; distance-statistic; genotype; genotyping; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest to disclose.

Figures

Fig. 1.
Fig. 1.
Cluster dendrogram generated from the ensemble matrix of pairwise distances. The ensemble matrix was clustered using Wards clustering method to generate the dendrogram shown. A 10-cluster model was considered the most parsimonious and branches are colour-coded according to the clusters identified using this model. Peripheral bar colours indicate specimens from case-patients epidemiologically linked to outbreaks of cyclosporiasis identified in the USA in our study, where at least one specimen was genotyped; colours of these bars indicate identified epidemiologic linkages per the legend. To determine the specific location of a given specimen in this dendrogram refer to Supplementary File S1, Appendix 2, which is a searchable pdf of the same dendrogram that includes all specimen names. The number of specimens assigned to each of the 10 genetic clusters is as follows: genetic cluster 1 (34 cases), cluster 2 (92 cases), cluster 3 (93 cases), cluster 4 (144 cases), cluster 5 (10 cases), cluster 6 (40 cases), cluster 7 (150 cases), cluster 8 (35 cases), cluster 9 (28 cases), cluster 10 (40 cases).
Fig. 2.
Fig. 2.
Ensemble pairwise distance matrix visualised using MicrobeTrace. To generate this network the same ensemble matrix used to construct Figure 1 (Supplementary File S2, Tab E) was filtered to a value of 0.15 using MicrobeTrace (https://github.com/CDCgov/MicrobeTrace/wiki). Nodes were colour-coded according to their epidemiological linkage, using the same colours used to denote epidemiologically-defined clusters in Figure 1.
Fig. 3.
Fig. 3.
Epidemiologic curve for cyclosporiasis cases (cases over time) plotted for each genetic cluster. Onset of illness dates for cases of cyclosporiasis is plotted as a separate histogram for each genetic cluster. Temporal clustering of specimens from cluster 4 and cluster 7 is apparent. Some temporal clustering seems apparent for cluster 2, which may possess a bimodal distribution. Colours used to denote each genetic cluster here corresponds to those used to denote genetic clusters in Figure 1.

References

    1. Ortega YR et al. (1993) Cyclospora species--a new protozoan pathogen of humans. The New England Journal of Medicine 328, 1308–1312. - PubMed
    1. Casillas SM, Bennett C and Straily A. (2018) Notes from the field: multiple cyclosporiasis outbreaks - United States, 2018. MMWR Morbidity and Mortality Weekly Report 67, 1101–1102. - PMC - PubMed
    1. Hedberg CW and Osterholm MT. (2016) Foodborne outbreaks caused by Cyclospora: the message is more important than the messenger. Epidemiology and Infection 144, 1803–1806. - PMC - PubMed
    1. Abanyie F et al. (2015) 2013 Multistate outbreaks of Cyclospora cayetanensis infections associated with fresh produce: focus on the Texas investigations. Epidemiology and Infection 143, 3451–3458. - PMC - PubMed
    1. Herwaldt BL. (2000) Cyclospora cayetanensis: a review, focusing on the outbreaks of cyclosporiasis in the 1990s. Clinical Infectious Diseases 31, 1040–1057. - PubMed

Publication types

Substances