RMI-DBG algorithm: A more agile iterative de Bruijn graph algorithm in short read genome assembly
- PMID: 33866959
- DOI: 10.1142/S0219720021500050
RMI-DBG algorithm: A more agile iterative de Bruijn graph algorithm in short read genome assembly
Abstract
The de Bruijn Graph algorithm (DBG) as one of the cornerstones algorithms in short read assembly has extended with the rapid advancement of the Next Generation Sequencing (NGS) technologies and low-cost production of millions of high-quality short reads. Erroneous reads, non-uniform coverage, and genomic repeats are three major problems that influence the performance of short read assemblers. To encounter these problems, the iterative DBG algorithm applies multiple [Formula: see text]-mers instead of a single [Formula: see text]-mer, by iterating the DBG graph over a range of [Formula: see text]-mer sizes from the minimum to the maximum. However, the iteration paradigm of iterative DBG deals with complex graphs from the beginning of the algorithm and therefore, causes more potential errors and computational time for resolving various unreal branches. In this research, we propose the Reverse Modified Iterative DBG graph (named RMI-DBG) for short read assembly. RMI-DBG utilizes the DBG algorithm and String graph to achieve the advantages of both algorithms. We present that RMI-DBG performs faster with comparable results in comparison to iterative DBG. Additionally, the quality of the proposed algorithm in terms of continuity and accuracy is evaluated with some commonly-used assemblers via several real datasets of the GAGE-B benchmark.
Keywords: Next Generation sequencing; Short read genome assembly; String graph; de Bruijn graph.
Similar articles
-
Integration of string and de Bruijn graphs for genome assembly.Bioinformatics. 2016 May 1;32(9):1301-7. doi: 10.1093/bioinformatics/btw011. Epub 2016 Jan 10. Bioinformatics. 2016. PMID: 26755626
-
RResolver: efficient short-read repeat resolution within ABySS.BMC Bioinformatics. 2022 Jun 21;23(1):246. doi: 10.1186/s12859-022-04790-z. BMC Bioinformatics. 2022. PMID: 35729491 Free PMC article.
-
FastEtch: A Fast Sketch-Based Assembler for Genomes.IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11. IEEE/ACM Trans Comput Biol Bioinform. 2019. PMID: 28910776
-
The present and future of de novo whole-genome assembly.Brief Bioinform. 2018 Jan 1;19(1):23-40. doi: 10.1093/bib/bbw096. Brief Bioinform. 2018. PMID: 27742661 Review.
-
Genome sequence assembly algorithms and misassembly identification methods.Mol Biol Rep. 2022 Nov;49(11):11133-11148. doi: 10.1007/s11033-022-07919-8. Epub 2022 Sep 23. Mol Biol Rep. 2022. PMID: 36151399 Review.
Cited by
-
Comparative genomics reveals insights into the potential of Lysinibacillus irui as a plant growth promoter.Appl Microbiol Biotechnol. 2024 Jun 11;108(1):370. doi: 10.1007/s00253-024-13210-6. Appl Microbiol Biotechnol. 2024. PMID: 38861018 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous