Maximum likelihood genome assembly
- PMID: 19645596
- PMCID: PMC3154397
- DOI: 10.1089/cmb.2009.0047
Maximum likelihood genome assembly
Abstract
Whole genome shotgun assembly is the process of taking many short sequenced segments (reads) and reconstructing the genome from which they originated. We demonstrate how the technique of bidirected network flow can be used to explicitly model the double-stranded nature of DNA for genome assembly. By combining an algorithm for the Chinese Postman Problem on bidirected graphs with the construction of a bidirected de Bruijn graph, we are able to find the shortest double-stranded DNA sequence that contains a given set of k-long DNA molecules. This is the first exact polynomial time algorithm for the assembly of a double-stranded genome. Furthermore, we propose a maximum likelihood framework for assembling the genome that is the most likely source of the reads, in lieu of the standard maximum parsimony approach (which finds the shortest genome subject to some constraints). In this setting, we give a bidirected network flow-based algorithm that, by taking advantage of high coverage, accurately estimates the copy counts of repeats in a genome. Our second algorithm combines these predicted copy counts with matepair data in order to assemble the reads into contigs. We run our algorithms on simulated read data from Escherichia coli and predict copy counts with extremely high accuracy, while assembling long contigs.
Figures





Similar articles
-
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8. BMC Genomics. 2016. PMID: 27556636 Free PMC article.
-
AN EFFICIENT ALGORITHM FOR CHINESE POSTMAN WALK ON BI-DIRECTED DE BRUIJN GRAPHS.Discrete Math Algorithms Appl. 2010;1:184-196. doi: 10.1007/978-3-642-17458-2_16. Discrete Math Algorithms Appl. 2010. PMID: 25364472 Free PMC article.
-
Read mapping on de Bruijn graphs.BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9. BMC Bioinformatics. 2016. PMID: 27306641 Free PMC article.
-
The present and future of de novo whole-genome assembly.Brief Bioinform. 2018 Jan 1;19(1):23-40. doi: 10.1093/bib/bbw096. Brief Bioinform. 2018. PMID: 27742661 Review.
-
Sequence assembly using next generation sequencing data--challenges and solutions.Sci China Life Sci. 2014 Nov;57(11):1140-8. doi: 10.1007/s11427-014-4752-9. Epub 2014 Oct 17. Sci China Life Sci. 2014. PMID: 25326069 Review.
Cited by
-
Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies.BMC Bioinformatics. 2011 Apr 13;12:95. doi: 10.1186/1471-2105-12-95. BMC Bioinformatics. 2011. PMID: 21486487 Free PMC article.
-
Centromere reference models for human chromosomes X and Y satellite arrays.Genome Res. 2014 Apr;24(4):697-707. doi: 10.1101/gr.159624.113. Epub 2014 Feb 5. Genome Res. 2014. PMID: 24501022 Free PMC article.
-
Genome graphs and the evolution of genome inference.Genome Res. 2017 May;27(5):665-676. doi: 10.1101/gr.214155.116. Epub 2017 Mar 30. Genome Res. 2017. PMID: 28360232 Free PMC article. Review.
-
Sama: a contig assembler with correctness guarantee.Algorithms Mol Biol. 2025 Jun 3;20(1):9. doi: 10.1186/s13015-025-00280-y. Algorithms Mol Biol. 2025. PMID: 40462082 Free PMC article.
-
Simplitigs as an efficient and scalable representation of de Bruijn graphs.Genome Biol. 2021 Apr 6;22(1):96. doi: 10.1186/s13059-021-02297-z. Genome Biol. 2021. PMID: 33823902 Free PMC article.
References
-
- Ahuja R.K. Magnanti T.L. Orlin J.B. Network flows: theory, algorithms, and applications. Prentice-Hall; Upper Saddle River, NJ: 1993.
-
- Appa G. Kotnyek B. A bidirected generalization of network matrices. Networks. 2006;47:185–198.
-
- Chaisson M. Pevzner P.A. Tang H. Fragment assembly with short reads. Bioinformatics. 2004;20:2067–2074. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources