Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep 14;21(1):631.
doi: 10.1186/s12864-020-07041-8.

Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing

Affiliations

Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing

Zhao Chen et al. BMC Genomics. .

Abstract

Background: We benchmarked the hybrid assembly approaches of MaSuRCA, SPAdes, and Unicycler for bacterial pathogens using Illumina and Oxford Nanopore sequencing by determining genome completeness and accuracy, antimicrobial resistance (AMR), virulence potential, multilocus sequence typing (MLST), phylogeny, and pan genome. Ten bacterial species (10 strains) were tested for simulated reads of both mediocre- and low-quality, whereas 11 bacterial species (12 strains) were tested for real reads.

Results: Unicycler performed the best for achieving contiguous genomes, closely followed by MaSuRCA, while all SPAdes assemblies were incomplete. MaSuRCA was less tolerant of low-quality long reads than SPAdes and Unicycler. The hybrid assemblies of five antimicrobial-resistant strains with simulated reads provided consistent AMR genotypes with the reference genomes. The MaSuRCA assembly of Staphylococcus aureus with real reads contained msr(A) and tet(K), while the reference genome and SPAdes and Unicycler assemblies harbored blaZ. The AMR genotypes of the reference genomes and hybrid assemblies were consistent for the other five antimicrobial-resistant strains with real reads. The numbers of virulence genes in all hybrid assemblies were similar to those of the reference genomes, irrespective of simulated or real reads. Only one exception existed that the reference genome and hybrid assemblies of Pseudomonas aeruginosa with mediocre-quality long reads carried 241 virulence genes, whereas 184 virulence genes were identified in the hybrid assemblies of low-quality long reads. The MaSuRCA assemblies of Escherichia coli O157:H7 and Salmonella Typhimurium with mediocre-quality long reads contained 126 and 118 virulence genes, respectively, while 110 and 107 virulence genes were detected in their MaSuRCA assemblies of low-quality long reads, respectively. All approaches performed well in our MLST and phylogenetic analyses. The pan genomes of the hybrid assemblies of S. Typhimurium with mediocre-quality long reads were similar to that of the reference genome, while SPAdes and Unicycler were more tolerant of low-quality long reads than MaSuRCA for the pan-genome analysis. All approaches functioned well in the pan-genome analysis of Campylobacter jejuni with real reads.

Conclusions: Our research demonstrates the hybrid assembly pipeline of Unicycler as a superior approach for genomic analyses of bacterial pathogens using Illumina and Oxford Nanopore sequencing.

Keywords: Bacterial pathogen; Genomic analyses; Hybrid assembly; Illumina sequencing; MaSuRCA; Oxford Nanopore sequencing; SPAdes; Unicycler.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interest.

Figures

Fig. 1
Fig. 1
Whole-genome phylogenetic tree of the hybrid assemblies of Pseudomonas aeruginosa PAO1 with simulated Illumina short reads and mediocre- or low-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler in addition to the reference genome (in red) compared to 30 P. aeruginosa strains. The scale bar indicates the genetic distance
Fig. 2
Fig. 2
Whole-genome phylogenetic tree of the hybrid assemblies of Listeria monocytogenes CFSAN008100 with real Illumina short reads and Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler in addition to the reference genome (in red) compared to 30 L. monocytogenes strains. The scale bar indicates the genetic distance
Fig. 3
Fig. 3
Core-genome phylogenetic tree of the hybrid assemblies of Escherichia coli O157:H7 Sakai with simulated Illumina short reads and mediocre- or low-quality Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler in addition to the reference genome (in red) compared to 30 Shiga-toxin producing E. coli (STEC) strains. The scale bar indicates the genetic distance
Fig. 4
Fig. 4
Core-genome phylogenetic tree of the hybrid assemblies of Cronobacter sakazakii CFSAN068773 with real Illumina short reads and Oxford Nanopore long reads using MaSuRCA, SPAdes, and Unicycler in addition to the reference genome (in red) compared to 30 C. sakazakii strains. The scale bar indicates the genetic distance
Fig. 5
Fig. 5
Pan genomes of the hybrid assemblies of Salmonella Typhimurium LT2 with simulated Illumina short reads and mediocre- or low-quality Oxford Nanopore long reads using MaSuRCA (mediocre-quality, a low-quality, d, SPAdes (mediocre-quality, b low-quality, e, and Unicycler (mediocre-quality, c low-quality, f) and 20 S. Typhimurium strains compared to the reference genome (g)
Fig. 6
Fig. 6
Pan genomes of the hybrid assemblies of Campylobacter jejuni CFSAN032806 with real Illumina short reads and Oxford Nanopore long reads using MaSuRCA (a), SPAdes (b), and Unicycler (c) and 20 C. jejuni strains compared to the reference genome (d)

Similar articles

Cited by

References

    1. Besser J, Carleton HA, Gerner-Smidt P, Lindsey RL, Trees E. Next-generation sequencing technologies and their application to the study and control of bacterial infections. Clin Microbiol Infect. 2018;24:335–341. - PMC - PubMed
    1. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17:333. - PMC - PubMed
    1. Pop M, Salzberg SL. Bioinformatics challenges of new sequencing technology. Trends Genet. 2008;24:142–149. - PMC - PubMed
    1. Kingsford C, Schatz MC, Pop M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics. 2010;11:21. - PMC - PubMed
    1. Klassen JL, Currie CR. Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation. BMC Genomics. 2012;13:14. - PMC - PubMed

LinkOut - more resources