Evaluation of an ensemble-based distance statistic for clustering MLST datasets using epidemiologically defined clusters of cyclosporiasis
- PMID: 32741426
- PMCID: PMC7439293
- DOI: 10.1017/S0950268820001697
Evaluation of an ensemble-based distance statistic for clustering MLST datasets using epidemiologically defined clusters of cyclosporiasis
Abstract
Outbreaks of cyclosporiasis, a food-borne illness caused by the coccidian parasite Cyclospora cayetanensis have increased in the USA in recent years, with approximately 2300 laboratory-confirmed cases reported in 2018. Genotyping tools are needed to inform epidemiological investigations, yet genotyping Cyclospora has proven challenging due to its sexual reproductive cycle which produces complex infections characterized by high genetic heterogeneity. We used targeted amplicon deep sequencing and a recently described ensemble-based distance statistic that accommodates heterogeneous (mixed) genotypes and specimens with partial genotyping data, to genotype and cluster 648 C. cayetanensis samples submitted to CDC in 2018. The performance of the ensemble was assessed by comparing ensemble-identified genetic clusters to analogous clusters identified independently based on common food exposures. Using these epidemiologic clusters as a gold standard, the ensemble facilitated genetic clustering with 93.8% sensitivity and 99.7% specificity. Hence, we anticipate that this procedure will greatly complement epidemiologic investigations of cyclosporiasis.
Keywords: Cyclospora cayetanensis; MLST; clustering; cyclosporiasis; deep sequencing; distance-statistic; genotype; genotyping; machine learning.
Conflict of interest statement
The authors have no conflicts of interest to disclose.
Figures
References
-
- Ortega YR et al. (1993) Cyclospora species--a new protozoan pathogen of humans. The New England Journal of Medicine 328, 1308–1312. - PubMed
-
- Herwaldt BL. (2000) Cyclospora cayetanensis: a review, focusing on the outbreaks of cyclosporiasis in the 1990s. Clinical Infectious Diseases 31, 1040–1057. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
