Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul;40(7):1075-1081.
doi: 10.1038/s41587-022-01220-6. Epub 2022 Feb 28.

Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads

Affiliations

Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads

Anton Bankevich et al. Nat Biotechnol. 2022 Jul.

Abstract

Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.

PubMed Disclaimer

References

    1. Nurk, S. et al. The complete sequence of a human genome. bioRxiv https://doi.org/10.1101/2021.05.26.445798 (2021).
    1. Miller, D. E. et al. Targeted long-read sequencing identifies missing disease-causing variation. Am. J. Hum. Genet. 108, 1436–1449 (2021). - DOI
    1. Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019). - DOI
    1. Compeau, P. E., Pevzner, P. A. & Tesler, G. How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol. 29, 987–991 (2011). - DOI
    1. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single cell sequencing. J. Comput. Biol. 19, 455–477 (2012). - DOI

Publication types