. 2019 Jan 9;20(1):23.

doi: 10.1186/s12864-018-5381-7.

Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing

Sarah Goldstein¹, Lidia Beka¹, Joerg Graf², Jonathan L Klassen³

Affiliations

¹ Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA.
² Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA. joerg.graf@uconn.edu.
³ Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA. jonathan.klassen@uconn.edu.

PMID: 30626323
PMCID: PMC6325685
DOI: 10.1186/s12864-018-5381-7

Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing

Sarah Goldstein et al. BMC Genomics. 2019.

. 2019 Jan 9;20(1):23.

doi: 10.1186/s12864-018-5381-7.

Authors

Sarah Goldstein¹, Lidia Beka¹, Joerg Graf², Jonathan L Klassen³

Affiliations

¹ Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA.
² Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA. joerg.graf@uconn.edu.
³ Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA. jonathan.klassen@uconn.edu.

PMID: 30626323
PMCID: PMC6325685
DOI: 10.1186/s12864-018-5381-7

Abstract

Background: Short-read sequencing technologies have made microbial genome sequencing cheap and accessible. However, closing genomes is often costly and assembling short reads from genomes that are repetitive and/or have extreme %GC content remains challenging. Long-read, single-molecule sequencing technologies such as the Oxford Nanopore MinION have the potential to overcome these difficulties, although the best approach for harnessing their potential remains poorly evaluated.

Results: We sequenced nine bacterial genomes spanning a wide range of GC contents using Illumina MiSeq and Oxford Nanopore MinION sequencing technologies to determine the advantages of each approach, both individually and combined. Assemblies using only MiSeq reads were highly accurate but lacked contiguity, a deficiency that was partially overcome by adding MinION reads to these assemblies. Even more contiguous genome assemblies were generated by using MinION reads for initial assembly, but these assemblies were more error-prone and required further polishing. This was especially pronounced when Illumina libraries were biased, as was the case for our strains with both high and low GC content. Increased genome contiguity dramatically improved the annotation of insertion sequences and secondary metabolite biosynthetic gene clusters, likely because long-reads can disambiguate these highly repetitive but biologically important genomic regions.

Conclusions: Genome assembly using short-reads is challenged by repetitive sequences and extreme GC contents. Our results indicate that these difficulties can be largely overcome by using single-molecule, long-read sequencing technologies such as the Oxford Nanopore MinION. Using MinION reads for assembly followed by polishing with Illumina reads generated the most contiguous genomes with sufficient accuracy to enable the accurate annotation of important but difficult to sequence genomic features such as insertion sequences and secondary metabolite biosynthetic gene clusters. The combination of Oxford Nanopore and Illumina sequencing can therefore cost-effectively advance studies of microbial evolution and genome-driven drug discovery.

Keywords: Genome assembly; Genome sequencing; Insertion sequences; Oxford Nanopore MinION; Secondary metabolites.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

N/A

Consent for publication

N/A

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
MinION reads improve assembly contiguity. The number of contigs (left), N50 (in Mbp, center), and assembly length (in Mbp, right) are shown for each of the MiSeq-based (SPAdes, Unicycler, SPAdes-hybrid, and Unicycler-hybrid) and MinION-based (Canu, Canu+Nanopolish, Canu+Pilon) genome assemblies. Results for *Pseudonocardia*, *Aeromonas*, and *Flavobacterium* are shown in blue, red, and green, respectively

**Fig. 2**
Comparison of *Pseudonocardia* assemblies generated during this study. (A): Heatmaps depicting Mash distances between the assemblies of each *Pseudonocardia* strain based on their shared k-mer content. Whiter colors indicate greater Mash distances between assemblies. (B): Mashtree analysis showing the relationships of all *Pseudonocardia* assemblies to each other, based on Mash distances. The scale bar represents a Mash distance of 0.003

**Fig. 3**
Quantification of insertion/deletions (indels, left) and single nucleotide polymorphisms (SNPs, right) in all strains sequenced during this study, as determined by aligning each assembly to the Canu+Pilon assembly for that strain as a reference

**Fig. 4**
Anvi’o analysis of annotation quality. Strains are grouped by species with *Pseudonocardia* shown in blue, *Aeromonas* shown in red, and *Flavobacterium* shown in green. Each heatmap row corresponds to an individual strain and each column corresponds to a unique assembly method

**Fig. 5**
The effect of coverage on Canu genome assembly contiguity. The number of contigs (top left), N50 (in Mbp, top center), assembly length (in Mbp, top right), SNPs per 1000 bp (bottom right), and indels per 1000 bp (bottom left) are shown for subsets of the Ps JKS002128 (blue), Av JG3 (red), and Fs ARS-166-14 (green) MinION reads used in Fig. 1

**Fig. 6**
Ps JKS002128 genome assembly quality affects secondary metabolite biosynthetic gene cluster annotation. (A) Homologies between BGCs predicted for each Ps JKS002128 assembly, with each row representing a unique BGC in the Ps JKS002128 genome. Filled boxes indicate the BGCs found in each assembly, colored according to the type of secondary metabolite that it is predicted to encode. White boxes indicate BGCs that were not found in that assembly. Some BGCs occur on multiple contigs or are separated into multiple gene clusters on the same assembly, indicated by either two or three polygons within a single box. BGCs may still be fragmented even if represented by a single box. (B) The total number of complete and fragmented BGCs predicted in each Ps JKS002128 genome assembly

**Fig. 7**
Fs ARS-166-14 genome assembly quality affects insertion sequences annotation. Both the total number of hits and hits with > 70% amino acid identity to insertion sequences in the ISfinder database are shown. The former likely includes false-positive annotations while the latter is more conservative

See this image and copyright information in PMC

References

1. Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016;17:333–351. doi: 10.1038/nrg.2016.49. - DOI - PMC - PubMed
1. Shendure J, Balasubramanian S, Church GM, Gilbert W, Rogers J, Schloss JA, et al. DNA sequencing at 40: past, present and future. Nature. 2017;550:345–353. doi: 10.1038/nature24286. - DOI - PubMed
1. Whiteford N, Haslam N, Weber G, Prügel-Bennett A, Essex JW, Roach PL, et al. An analysis of the feasibility of short read sequencing. Nucleic Acids Res. 2005;33:e171. doi: 10.1093/nar/gni170. - DOI - PMC - PubMed
1. Haubold B, Wiehe T. How repetitive are genomes? BMC Bioinformatics. 2006;7:541. 10.1186/1471-2105-7-541. - PMC - PubMed
1. Kingsford C, Schatz MC, Pop M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics. 2010;11:21. 10.1186/1471-2105-11-21. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing

Affiliations

Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing

Authors

Affiliations

Abstract

Conflict of interest statement

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Molecular Biology Databases

Miscellaneous