Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Apr 11:2023.12.12.571215.
doi: 10.1101/2023.12.12.571215.

Hybracter: Enabling Scalable, Automated, Complete and Accurate Bacterial Genome Assemblies

Affiliations

Hybracter: Enabling Scalable, Automated, Complete and Accurate Bacterial Genome Assemblies

George Bouras et al. bioRxiv. .

Update in

Abstract

Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- and long-reads) assembly approaches. Complete genomes allow a deeper understanding of bacterial evolution and genomic variation beyond single nucleotide variants (SNVs). They are also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance (AMR) genes. However, small plasmids are often missed or misassembled by long-read assembly algorithms. Here, we present Hybracter which allows for the fast, automatic, and scalable recovery of near-perfect complete bacterial genomes using a long-read first assembly approach. Hybracter can be run either as a hybrid assembler or as a long-read only assembler. We compared Hybracter to existing automated hybrid and long-read only assembly tools using a diverse panel of samples of varying levels of long-read accuracy with manually curated ground truth reference genomes. We demonstrate that Hybracter as a hybrid assembler is more accurate and faster than the existing gold standard automated hybrid assembler Unicycler. We also show that Hybracter with long-reads only is the most accurate long-read only assembler and is comparable to hybrid methods in accurately recovering small plasmids.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest The authors declare that there are no conflicts of interest.

Figures

Figure 1:
Figure 1:
Outline of the Hybracter workflow.
Figure 2:
Figure 2:
Comparison of the counts of single nucleotide variants (SNVs) and small (<60bp) insertions and deletions (InDels) (A) and the total number of large (>60bp) InDels (B) for the hybrid tools benchmarked (Hybracter hybrid in dark blue, Dragonflye hybrid in orange and Unicycler in green). The counts of SNVs and small InDels (C) and the total number of large InDels (D) for the long tools benchmarked (Hybracter long in light blue, Dragonflye long in grey) are also shown. All data presented is from the benchmarking output run with 8 threads.
Figure 3:
Figure 3:
Comparison of wall-clock runtime (in seconds) of Hybracter hybrid, Dragonflye hybrid, Unicycler, Hybracter long and Dragonflye long when run with 8 and 16 threads.
Figure 4:
Figure 4:
Comparison of the counts of small (<60bp) (A) and large (>60bp) (B) insertions and deletions (InDels) and SNVs (C) for Hybracter hybrid, Dragonflye hybrid, Unicycler, Hybracter long and Dragonflye long chromosome assemblies of Lerminiaux Isolate B (Enterobacter cloacae) at 5x intervals of sequencing depth from 10x to 100x.

Similar articles

References

    1. Land M. et al. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics 15, 141–161 (2015). - PMC - PubMed
    1. Goldstein S., Beka L., Graf J. & Klassen J. L. Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing. BMC Genomics 20, 23 (2019). - PMC - PubMed
    1. De Maio N. et al. Comparison of long-read sequencing technologies in the hybrid assembly of complex bacterial genomes. Microbial Genomics 5, e000294 (2019). - PMC - PubMed
    1. Wick R. R., Judd L. M., Gorrie C. L. & Holt K. E. Y. 2017. Completing bacterial genome assemblies with multiplex MinION sequencing. Microbial Genomics 3, e000132. - PMC - PubMed
    1. Wick R. R., Judd L. M., Gorrie C. L. & Holt K. E. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLOS Computational Biology 13, e1005595 (2017). - PMC - PubMed

Publication types