Genotyping genetically heterogeneous Cyclospora cayetanensis infections to complement epidemiological case linkage

Affiliations

¹ Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA, USA.
² Oak Ridge Institute for Science and Education, Oak ridge, TN, USA.
³ Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
⁴ Waterborne Disease Prevention Branch, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, USA.

PMID: 31148531
PMCID: PMC6699905
DOI: 10.1017/S0031182019000581

Genotyping genetically heterogeneous Cyclospora cayetanensis infections to complement epidemiological case linkage

Joel L N Barratt et al. Parasitology. 2019 Sep.

. 2019 Sep;146(10):1275-1283.

doi: 10.1017/S0031182019000581. Epub 2019 Jun 20.

Authors

Affiliations

¹ Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA, USA.
² Oak Ridge Institute for Science and Education, Oak ridge, TN, USA.
³ Malaria Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
⁴ Waterborne Disease Prevention Branch, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA, USA.

PMID: 31148531
PMCID: PMC6699905
DOI: 10.1017/S0031182019000581

Abstract

Sexually reproducing pathogens such as Cyclospora cayetanensis often produce genetically heterogeneous infections where the number of unique sequence types detected at any given locus varies depending on which locus is sequenced. The genotypes assigned to these infections quickly become complex when additional loci are analysed. This genetic heterogeneity confounds the utility of traditional sequence-typing and phylogenetic approaches for aiding epidemiological trace-back, and requires new methods to address this complexity. Here, we describe an ensemble of two similarity-based classification algorithms, including a Bayesian and heuristic component that infer the relatedness of C. cayetanensis infections. The ensemble requires a set of haplotypes as input and assigns arbitrary distances to specimen pairs reflecting their most likely relationships. The approach was applied to data generated from a test cohort of 88 human fecal specimens containing C. cayetanensis, including 30 from patients whose infections were associated with epidemiologically defined outbreak clusters of cyclosporiasis. The ensemble assigned specimens to plausible clusters of genetically related infections despite their complex haplotype composition. These relationships were corroborated by a significant number of epidemiological linkages (P < 0.0001) suggesting the ensemble's utility for aiding epidemiological trace-back investigations of cyclosporiasis.

Keywords: Algorithm; Bayesian; Cyclospora cayetanensis; bioinformatics; ensemble; epidemiology; heuristic.

PubMed Disclaimer

Figures

**Fig. 1.**
Workflow for selection of *Cyclospora cayetanensis* typing markers. Raw genome sequence data generated on the Illumina MiSeq platform were assessed for quality using FASTQC. AdaptorRemoval v2.1.7 (Schubert *et al*., 2016) was used to remove adaptor sequences from reads and to merge overlapping paired reads into consensus sequences. SPAades v3.9.0 (Bankevich *et al*., 2012) was used to *de novo* assemble the reads. During the assembly cleaning process, contigs derived from contaminating (Contam.) prokaryotic human gut flora were removed using BBMap (http://sourceforge.net/projects/bbmap/). The assemblies were assessed for quality using QUAST v4.3 (Gurevich *et al*., 2013) before and after the cleaning phase. Contigs with 60 times coverage, greater than or equal to 3000 base pairs (bp) long and with coding regions identified using GeneMark-ES v4.33 (Borodovsky and Lomsadze, 2011), were retained as part of the core genome. Single nucleotide polymorphisms (SNPs) were detected across the core genome assemblies using kSNP v3.021 (Gardner *et al*., 2015) and this information was used to identify high-entropy genomic loci. Genomic regions containing high confidence SNPs (i.e. those SNPs within genomic regions of the highest coverage) occurring within SNP-dense regions (i.e. where several informative SNPs exist within a genomic region of less than 1 kilobase pair in size), were identified as candidate typing markers for validation by PCR amplification and Sanger sequencing. The markers with the highest amplification and sequencing success rate were considered ideal candidates for *C. cayetanensis* typing, and were PCR amplified and sequenced from stool specimens provided by a diverse range of patients. The resulting sequences were then subjected to typing.

**Fig. 2.**
Cluster dendrogram generated from the Ensemble Distance Matrix. Our ensemble of two similarity-based classification algorithms resolved the *C. cayetanensis* infections from 88 fecal specimens into sixteen clusters (different branch colours). Clusters were delineated by cutting the tree at the node indicating the separation of the Chinese sample (CHN_HEN01) from its nearest neighbour. The specimen names are shaded in colours according to their epidemiological linkage. Unshaded specimen names represent sporadic or unlinked cases of cyclosporiasis. Specimen identity codes begin with a two letter state abbreviation (except for Jakarta, Indonesia; JK), followed by two numbers indicating the year, and ending a unique identifier assigned to that specimen (2–3 digits). The specimen from China (CHN_HEN01) follows a different naming convention as sequence data from this specimen had been submitted to GenBank previously by different investigators (GenBank accession: NW_019211453).

**Fig. 3.**
The haplotype composition of each specimen genotyped in this study represented as a barcode. The 88 specimens in the study cohort were assigned to 16 distinct clusters by the ensemble, with cluster assignments shown on the right hand side of each panel. These cluster assignments were made based on the haplotype composition of each sample, with the loci and their respective haplotype numbers shown along the two top rows. Boxes are shaded black if the corresponding haplotype was detected in a specimen. Specimen names are listed in the far left column of each panel. Rows are shaded grey if sequencing was unsuccessful for a given marker. This figure was generated to graphically represent the groupings assigned by the ensemble when presented with a set of complex genotyping data.

See this image and copyright information in PMC

References

1. Abanyie F, Harvey RR, Harris JR, Wiegand RE, Gaul L, Desvignes-Kendrick M, Irvin K, Williams I, Hall RL, Herwaldt B, Gray EB, Qvarnstrom Y, Wise ME, Cantu V, Cantey PT, Bosch S, AJ DAS, Fields A, Bishop H, Wellman A, Beal J, Wilson N, Fiore AE, Tauxe R, Lance S, Slutsker L, Parise M and Multistate Cyclosporiasis Outbreak Investigation Team (2015) 2013 Multistate outbreaks of Cyclospora cayetanensis infections associated with fresh produce: focus on the Texas investigations. Epidemiology and Infection 143, 3451–3458. - PMC - PubMed
1. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA and Pevzner PA (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 19, 455–477. - PMC - PubMed
1. Borodovsky M and Lomsadze A (2011) Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Current Protocols in Bioinformatics /Editoral Board, Andreas D. Baxevanis, [et al.], CHAPTER: Unit–4.610. doi: 10.1002/0471250953.bi0406s35. - DOI - PMC - PubMed
1. Brown AC, Chen JC, Watkins LKF, Campbell D, Folster JP, Tate H, Wasilenko J, Van Tubbergen C and Friedman CR (2018) CTX-M-65 extended-spectrum beta-lactamase-producing Salmonella enterica Serotype Infantis, United States(1). Emerging Infectious Diseases 24, 2284–2291. - PMC - PubMed
1. CDC (2017a) Cyclosporiasis: Outbreak Investigations and Updates. Vol. 2017 Centers for Disease Control and Prevention, Global Health, Division of Parasitic Disease, Atlanta, Georgia, USA https://www.cdc.gov/parasites/cyclosporiasis/outbreaks/index.html

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

CC999999/ImCDC/Intramural CDC HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Genotyping genetically heterogeneous Cyclospora cayetanensis infections to complement epidemiological case linkage

Affiliations

Genotyping genetically heterogeneous Cyclospora cayetanensis infections to complement epidemiological case linkage

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources