Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep;146(10):1275-1283.
doi: 10.1017/S0031182019000581. Epub 2019 Jun 20.

Genotyping genetically heterogeneous Cyclospora cayetanensis infections to complement epidemiological case linkage

Affiliations

Genotyping genetically heterogeneous Cyclospora cayetanensis infections to complement epidemiological case linkage

Joel L N Barratt et al. Parasitology. 2019 Sep.

Abstract

Sexually reproducing pathogens such as Cyclospora cayetanensis often produce genetically heterogeneous infections where the number of unique sequence types detected at any given locus varies depending on which locus is sequenced. The genotypes assigned to these infections quickly become complex when additional loci are analysed. This genetic heterogeneity confounds the utility of traditional sequence-typing and phylogenetic approaches for aiding epidemiological trace-back, and requires new methods to address this complexity. Here, we describe an ensemble of two similarity-based classification algorithms, including a Bayesian and heuristic component that infer the relatedness of C. cayetanensis infections. The ensemble requires a set of haplotypes as input and assigns arbitrary distances to specimen pairs reflecting their most likely relationships. The approach was applied to data generated from a test cohort of 88 human fecal specimens containing C. cayetanensis, including 30 from patients whose infections were associated with epidemiologically defined outbreak clusters of cyclosporiasis. The ensemble assigned specimens to plausible clusters of genetically related infections despite their complex haplotype composition. These relationships were corroborated by a significant number of epidemiological linkages (P < 0.0001) suggesting the ensemble's utility for aiding epidemiological trace-back investigations of cyclosporiasis.

Keywords: Algorithm; Bayesian; Cyclospora cayetanensis; bioinformatics; ensemble; epidemiology; heuristic.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Workflow for selection of Cyclospora cayetanensis typing markers. Raw genome sequence data generated on the Illumina MiSeq platform were assessed for quality using FASTQC. AdaptorRemoval v2.1.7 (Schubert et al., 2016) was used to remove adaptor sequences from reads and to merge overlapping paired reads into consensus sequences. SPAades v3.9.0 (Bankevich et al., 2012) was used to de novo assemble the reads. During the assembly cleaning process, contigs derived from contaminating (Contam.) prokaryotic human gut flora were removed using BBMap (http://sourceforge.net/projects/bbmap/). The assemblies were assessed for quality using QUAST v4.3 (Gurevich et al., 2013) before and after the cleaning phase. Contigs with 60 times coverage, greater than or equal to 3000 base pairs (bp) long and with coding regions identified using GeneMark-ES v4.33 (Borodovsky and Lomsadze, 2011), were retained as part of the core genome. Single nucleotide polymorphisms (SNPs) were detected across the core genome assemblies using kSNP v3.021 (Gardner et al., 2015) and this information was used to identify high-entropy genomic loci. Genomic regions containing high confidence SNPs (i.e. those SNPs within genomic regions of the highest coverage) occurring within SNP-dense regions (i.e. where several informative SNPs exist within a genomic region of less than 1 kilobase pair in size), were identified as candidate typing markers for validation by PCR amplification and Sanger sequencing. The markers with the highest amplification and sequencing success rate were considered ideal candidates for C. cayetanensis typing, and were PCR amplified and sequenced from stool specimens provided by a diverse range of patients. The resulting sequences were then subjected to typing.
Fig. 2.
Fig. 2.
Cluster dendrogram generated from the Ensemble Distance Matrix. Our ensemble of two similarity-based classification algorithms resolved the C. cayetanensis infections from 88 fecal specimens into sixteen clusters (different branch colours). Clusters were delineated by cutting the tree at the node indicating the separation of the Chinese sample (CHN_HEN01) from its nearest neighbour. The specimen names are shaded in colours according to their epidemiological linkage. Unshaded specimen names represent sporadic or unlinked cases of cyclosporiasis. Specimen identity codes begin with a two letter state abbreviation (except for Jakarta, Indonesia; JK), followed by two numbers indicating the year, and ending a unique identifier assigned to that specimen (2–3 digits). The specimen from China (CHN_HEN01) follows a different naming convention as sequence data from this specimen had been submitted to GenBank previously by different investigators (GenBank accession: NW_019211453).
Fig. 3.
Fig. 3.
The haplotype composition of each specimen genotyped in this study represented as a barcode. The 88 specimens in the study cohort were assigned to 16 distinct clusters by the ensemble, with cluster assignments shown on the right hand side of each panel. These cluster assignments were made based on the haplotype composition of each sample, with the loci and their respective haplotype numbers shown along the two top rows. Boxes are shaded black if the corresponding haplotype was detected in a specimen. Specimen names are listed in the far left column of each panel. Rows are shaded grey if sequencing was unsuccessful for a given marker. This figure was generated to graphically represent the groupings assigned by the ensemble when presented with a set of complex genotyping data.

References

    1. Abanyie F, Harvey RR, Harris JR, Wiegand RE, Gaul L, Desvignes-Kendrick M, Irvin K, Williams I, Hall RL, Herwaldt B, Gray EB, Qvarnstrom Y, Wise ME, Cantu V, Cantey PT, Bosch S, AJ DAS, Fields A, Bishop H, Wellman A, Beal J, Wilson N, Fiore AE, Tauxe R, Lance S, Slutsker L, Parise M and Multistate Cyclosporiasis Outbreak Investigation Team (2015) 2013 Multistate outbreaks of Cyclospora cayetanensis infections associated with fresh produce: focus on the Texas investigations. Epidemiology and Infection 143, 3451–3458. - PMC - PubMed
    1. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA and Pevzner PA (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 19, 455–477. - PMC - PubMed
    1. Borodovsky M and Lomsadze A (2011) Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Current Protocols in Bioinformatics /Editoral Board, Andreas D. Baxevanis, [et al.], CHAPTER: Unit–4.610. doi: 10.1002/0471250953.bi0406s35. - DOI - PMC - PubMed
    1. Brown AC, Chen JC, Watkins LKF, Campbell D, Folster JP, Tate H, Wasilenko J, Van Tubbergen C and Friedman CR (2018) CTX-M-65 extended-spectrum beta-lactamase-producing Salmonella enterica Serotype Infantis, United States(1). Emerging Infectious Diseases 24, 2284–2291. - PMC - PubMed
    1. CDC (2017a) Cyclosporiasis: Outbreak Investigations and Updates. Vol. 2017 Centers for Disease Control and Prevention, Global Health, Division of Parasitic Disease, Atlanta, Georgia, USA https://www.cdc.gov/parasites/cyclosporiasis/outbreaks/index.html

Publication types