AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads
- PMID: 33621981
- DOI: 10.1093/bib/bbab022
AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads
Abstract
Contigs assembled from the third-generation sequencing long reads are usually more complete than the second-generation short reads. However, the current algorithms still have difficulty in assembling the long reads into the ideal complete and accurate genome, or the theoretical best result [1]. To improve the long read contigs and with more and more fully sequenced genomes available, it could still be possible to use the similar genome-assisted reassembly method [2], which was initially proposed for the short reads making use of a closely related genome (similar genome) to the sequencing genome (target genome). The method aligns the contigs and reads to the similar genome, and then extends and refines the aligned contigs with the aligned reads. Here, we introduce AlignGraph2, a similar genome-assisted reassembly pipeline for the PacBio long reads. The AlignGraph2 pipeline is the second version of AlignGraph algorithm proposed by us but completely redesigned, can be inputted with either error-prone or HiFi long reads, and contains four novel algorithms: similarity-aware alignment algorithm and alignment filtration algorithm for alignment of the long reads and preassembled contigs to the similar genome, and reassembly algorithm and weight-adjusted consensus algorithm for extension and refinement of the preassembled contigs. In our performance tests on both error-prone and HiFi long reads, AlignGraph2 can align 5.7-27.2% more long reads and 7.3-56.0% more bases than some current alignment algorithm and is more efficient or comparable to the others. For contigs assembled with various de novo algorithms and aligned to similar genomes (aligned contigs), AlignGraph2 can extend 8.7-94.7% of them (extendable contigs), and obtain contigs of 7.0-249.6% larger N50 value and 5.2-87.7% smaller number of indels per 100 kbp (extended contigs). With genomes of decreased similarities, AlignGraph2 also has relatively stable performance. The AlignGraph2 software can be downloaded for free from this site: https://github.com/huangs001/AlignGraph2.
Keywords: de Brujin graph; genome assembly; similar genome.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Similar articles
-
AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references.Bioinformatics. 2014 Jun 15;30(12):i319-i328. doi: 10.1093/bioinformatics/btu291. Bioinformatics. 2014. PMID: 24932000 Free PMC article.
-
HALC: High throughput algorithm for long read error correction.BMC Bioinformatics. 2017 Apr 5;18(1):204. doi: 10.1186/s12859-017-1610-3. BMC Bioinformatics. 2017. PMID: 28381259 Free PMC article.
-
ReMILO: reference assisted misassembly detection algorithm using short and long reads.Bioinformatics. 2018 Jan 1;34(1):24-32. doi: 10.1093/bioinformatics/btx524. Bioinformatics. 2018. PMID: 28961789
-
The present and future of de novo whole-genome assembly.Brief Bioinform. 2018 Jan 1;19(1):23-40. doi: 10.1093/bib/bbw096. Brief Bioinform. 2018. PMID: 27742661 Review.
-
Genome sequence assembly algorithms and misassembly identification methods.Mol Biol Rep. 2022 Nov;49(11):11133-11148. doi: 10.1007/s11033-022-07919-8. Epub 2022 Sep 23. Mol Biol Rep. 2022. PMID: 36151399 Review.
Cited by
-
Application of Sparse Representation in Bioinformatics.Front Genet. 2021 Dec 15;12:810875. doi: 10.3389/fgene.2021.810875. eCollection 2021. Front Genet. 2021. PMID: 34976030 Free PMC article. Review.
-
Immunoglobulin Classification Based on FC* and GC* Features.Front Genet. 2022 Jan 24;12:827161. doi: 10.3389/fgene.2021.827161. eCollection 2021. Front Genet. 2022. PMID: 35140745 Free PMC article.
-
Draft genome of the aardaker (Lathyrus tuberosus L.), a tuberous legume.BMC Genom Data. 2022 Sep 4;23(1):70. doi: 10.1186/s12863-022-01083-5. BMC Genom Data. 2022. PMID: 36057561 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources