Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr;23(4):251.
doi: 10.3892/mmr.2021.11890. Epub 2021 Feb 4.

Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly

Affiliations

Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly

Marios Gavrielatos et al. Mol Med Rep. 2021 Apr.

Abstract

Genome assemblers are computational tools for de novo genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their contiguity and the occurrences of misassemblies (duplications, deletions, translocations or inversions). The rapid development of sequencing technologies has enabled the rise of novel de novo genome assembly strategies. The ultimate goal of such strategies is to utilise the features of each sequencing platform in order to address the existing weaknesses of each sequencing type and compose a complete and correct genome map. In the present study, the hybrid strategy, which is based on Illumina short paired‑end reads and Nanopore long reads, was benchmarked using MaSuRCA and Wengan assemblers. Moreover, the long‑read assembly strategy, which is based on Nanopore reads, was benchmarked using Canu or PacBio HiFi reads were benchmarked using Hifiasm and HiCanu. The assemblies were performed on a computational cluster with limited computational resources. Their outputs were evaluated in terms of accuracy and computational performance. PacBio HiFi assembly strategy outperforms the other ones, while Hi‑C scaffolding, which is based on chromatin 3D structure, is required in order to increase continuity, accuracy and completeness when large and complex genomes, such as the human one, are assembled. The use of Hi‑C data is also necessary while using the hybrid assembly strategy. The results revealed that HiFi sequencing enabled the rise of novel algorithms which require less genome coverage than that of the other strategies making the assembly a less computationally demanding task. Taken together, these developments may lead to the democratisation of genome assembly projects which are now approachable by smaller labs with limited technical and financial resources.

Keywords: de novo genome assembly; next generation sequencing; third generation sequencing; genomics; benchmarking; bioinformatics.

PubMed Disclaimer

Conflict of interest statement

DAS is the Editor-in-Chief for the journal, but had no personal involvement in the reviewing process, or any influence in terms of adjudicating on the final decision, for this article. The other authors declare that they have no competing interests.

Figures

Figure 1.
Figure 1.
Pipeline stages and tools used in each step of the workflow.
Figure 2.
Figure 2.
Drosophila virilis assemblies comparison. Hybrid assemblers, MaSuRCA (CABOG and Flye) and Wengan, used Illumina short reads and Nanopore long reads for the assembly, while Canu, a long read assembler utilised Nanopore long reads for the same purpose. SALSA improved contiguity in all assemblies.
Figure 3.
Figure 3.
Drosophila melanogaster Hifiasm assemblies comparison. Hifiasm performed three different assemblies using PacBio Hifi long reads with different insert size (11 Kbp, 24 Kbp) and coverage (37×, 40×, 92×). A region in one of the two termini of chr 2L appears translocated in the assemblies produced by 11 Kbp insert size with 37× coverage and 24 Kbp insert size with 92× coverage. The same region appears deleted in the assembly produced by 24 Kbp insert size with 40× coverage prior to SALSA scaffolding and inverted in the same assembly with SALSA scaffolding.
Figure 4.
Figure 4.
Drosophila melanogaster HiCanu assemblies comparison. HiCanu performed three different assemblies using PacBio Hifi long reads with different insert size (11 Kbp, 24 Kbp) and coverage (37×, 40×, 92×). Deletions of major regions or entire chromosomes can be found in all assemblies. Apparent duplications as of major parts of chr 3L in the assemblies produced by 24 Kbp insert size with 40× coverage are the results of phasing.
Figure 5.
Figure 5.
Homo sapiens assemblies comparison. Wengan hybrid assembler used 34× Illumina short reads and 30× Nanopore long reads for the assembly, while Hifiasm used 16× PacBio Hifi long reads.

Similar articles

Cited by

References

    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. - DOI - PubMed
    1. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA. 1977;74:5463–5467. doi: 10.1073/pnas.74.12.5463. - DOI - PMC - PubMed
    1. Kent WJ, Haussler D. Assembly of the working draft of the human genome with GigAssembler. Genome Res. 2001;11:1541–1548. doi: 10.1101/gr.183201. - DOI - PMC - PubMed
    1. Shendure J, Balasubramanian S, Church GM, Gilbert W, Rogers J, Schloss JA, Waterston RH. DNA sequencing at 40: Past, present and future. Nature. 2017;550:345–353. doi: 10.1038/nature24286. - DOI - PubMed
    1. Salzberg SL, Yorke JA. Beware of mis-assembled genomes. Bioinformatics. 2005;21:4320–4321. doi: 10.1093/bioinformatics/bti769. - DOI - PubMed

LinkOut - more resources