Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 10:10:15.
doi: 10.1186/s13099-018-0242-0. eCollection 2018.

A hybrid reference-guided de novo assembly approach for generating Cyclospora mitochondrion genomes

Affiliations

A hybrid reference-guided de novo assembly approach for generating Cyclospora mitochondrion genomes

G R Gopinath et al. Gut Pathog. .

Abstract

Cyclospora cayetanensis is a coccidian parasite associated with large and complex foodborne outbreaks worldwide. Linking samples from cyclosporiasis patients during foodborne outbreaks with suspected contaminated food sources, using conventional epidemiological methods, has been a persistent challenge. To address this issue, development of new methods based on potential genomically-derived markers for strain-level identification has been a priority for the food safety research community. The absence of reference genomes to identify nucleotide and structural variants with a high degree of confidence has limited the application of using sequencing data for source tracking during outbreak investigations. In this work, we determined the quality of a high resolution, curated, public mitochondrial genome assembly to be used as a reference genome by applying bioinformatic analyses. Using this reference genome, three new mitochondrial genome assemblies were built starting with metagenomic reads generated by sequencing DNA extracted from oocysts present in stool samples from cyclosporiasis patients. Nucleotide variants were identified in the new and other publicly available genomes in comparison with the mitochondrial reference genome. A consolidated workflow, presented here, to generate new mitochondrion genomes using our reference-guided de novo assembly approach could be useful in facilitating the generation of other mitochondrion sequences, and in their application for subtyping C. cayetanensis strains during foodborne outbreak investigations.

Keywords: Cyclosporiasis; De novo assembly; Genome sequencing; Mitochondrion; Reference genome; Single nucleotide polymorphisms; Subtyping.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Workflow chart for recovering mitochondrial genomes from metagenomic sequence datasets. The mitochondrion reference genome KP231180 was used twice to generate new assemblies using metagenomic reads from stool samples. First, NGS reads were mapped to the reference to gather mitochondrion-specific sequences. Secondly, after the assembly the orientation of the sequences in the new assemblies were corrected to be in alignment to the reference genome for downstream analysis. It is anticipated that the reference-guided, de novo assembly workflow can be modified to fit the nature of any NGS datasets
Fig. 2
Fig. 2
Multiple alignment of the four public mitochondrial genome assemblies. Track 1: annotated KP231180 [4]; track 2: CM0003498; track 3: KP658101 [7] and track 4: KP796149 [6]. KP231180, the longest of the four assemblies in track 1 was compared with other three mitochondrial assemblies) using the progressiveMauve algorithm implemented on the Geneious suite. CM003498 and KP658101 assemblies had shuffled sequences indicated by crisscrossing lines connecting a terminal pink block of sequence from track 1 re-located to different regions in tracks 2 and 3. Annotations could not be mapped from KP231180 to CM003498 due to the sequence shuffling. The assemblies in tracks 3 and 4, namely, KP658101 and KP796149, were also of different lengths mainly due to some deletions not seen in this illustration
Fig. 3
Fig. 3
Identification of four repeats in the mitochondrion genome potentially affecting de novo assembly quality. A 45 bp deletion was observed in the terminal region of 6129 bp-long KP796149 assembly (greyed out sequence in the top track), which was investigated further. Four 15-mer repeats have been observed in the 6274 long KP231180 assemblies. Three of the four repeats were seen missing in the KP796149 assembly after alignment indicated by a blue triangle with a red bar (last track). Though the sequence from KP796149 appears to be aligned with the reads and KP231180, the colored and missing bases in the last track point to a forced misalignment. The C5 source-reads (used as a representative of all the three source-read datasets) mapped to this region in KP231180 and its assembly contained these repeats as expected (11 tracks in the middle). The presence of just one of the four repeats in KP796149 sequence [6] would result in improper mapping of the reads and render any dependent de novo assembly incomplete
Fig. 4
Fig. 4
Comparison of the new mitochondrial assemblies and the public sequences with the reference genome. The topmost track with CDS (annotations in red) is the reference genome, KP231180. Tracks 1–3: 6274 bp long C5, C8 and C10 assemblies respectively; tracks 4: 6229 bp long KP796149 [6] assembly; track 5–6: 6273 bp long CM003498 assembly split into two fragments by the alignment program to obtain collinearity. The fragment start-stop positions are given in the track name; track 7: 6184 bases long KP658101 [7] assembly. The three mitochondrion assemblies CM0003498, KP658101 and KP796149 along with the new C5, C8 and C10 sequences were compared with the reference genome evaluated in this study. The mapping and visualization were carried out using Geneious suite utilities to highlight SNPs. A synonymous transversion mutation was identified in all six assemblies in comparison with the reference. The alleles present in the query genomes with respect to the base in position 4415 of the reference genome are shown in the inset box. In addition, anomalous SNPs (marked by *) were observed in KP796149 and KP658101. The sequence integrity appears to be affected in the public sequences due to sequence shuffling (tracks 5–6 of CM0003498) and deletions indicated by blue triangles (tracks 4–7 for three genomes). The manually curated KP231180 [4] was used as the reference genome (topmost track with CDS annotations) for further analysis to avoid potential mis-assembly and propagation of false positive variants possible with other three genomes

References

    1. Scallan E, Hoekstra RM, Mahon BE, Jones TF, Griffin PM. An assessment of the human health impact of seven leading foodborne pathogens in the United States using disability adjusted life years. Epidemiol Infect. 2015;143(13):2795–2804. doi: 10.1017/S0950268814003185. - DOI - PMC - PubMed
    1. Chacin-Bonilla, L. 2017. Cyclospora cayetanensis. In: JB Rose and B Jiménez-Cisneros, editors. Global water pathogens project. http://www.waterpathogens.org (R.Fayer and W. Jakubowski, editor Part 3 Protists). http://www.waterpathogens.org/book/cyclospora-cayetanensis. E. Lansing: Michigan State University, UNESCO.
    1. Chacín-Bonilla L. Epidemiology of Cyclospora cayetanensis: a review focusing in endemic areas. Acta Trop. 2010;115:181–193. doi: 10.1016/j.actatropica.2010.04.001. - DOI - PubMed
    1. Cinar HN, Gopinath G, Jarvis K, Murphy HR. The complete mitochondrial genome of the foodborne parasitic pathogen Cyclospora cayetanensis. PLoS ONE. 2015;10(6):e0128645. doi: 10.1371/journal.pone.0128645. - DOI - PMC - PubMed
    1. Cinar HN, Qvarnstrom Y, Wei-Pridgeon Y, Li W, Nascimento FS, Arrowood MJ, Murphy HR, Jang A, Kim E, Kim R, da Silva A, Gopinath GR. Comparative sequence analysis of Cyclospora cayetanensis apicoplast genomes originating from diverse geographical regions. Parasit Vectors. 2016;9(1):611. doi: 10.1186/s13071-016-1896-4. - DOI - PMC - PubMed