Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul;10(4):354-66.
doi: 10.1093/bib/bbp026. Epub 2009 May 29.

Genome assembly reborn: recent computational challenges

Affiliations

Genome assembly reborn: recent computational challenges

Mihai Pop. Brief Bioinform. 2009 Jul.

Abstract

Research into genome assembly algorithms has experienced a resurgence due to new challenges created by the development of next generation sequencing technologies. Several genome assemblers have been published in recent years specifically targeted at the new sequence data; however, the ever-changing technological landscape leads to the need for continued research. In addition, the low cost of next generation sequencing data has led to an increased use of sequencing in new settings. For example, the new field of metagenomics relies on large-scale sequencing of entire microbial communities instead of isolate genomes, leading to new computational challenges. In this article, we outline the major algorithmic approaches for genome assembly and describe recent developments in this domain.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
(A) Overlap between two reads—note that agreement within overlapping region need not be perfect; (B) Correct assembly of a genome with two repeats (boxes) using four reads A–D; (C) Assembly produced by the greedy approach. Reads A and D are assembled first, incorrectly, because they overlap best and (D) Disagreement between two reads (thin lines) that could extend a contig (thick line), indicating a potential repeat boundary. Contig extension must be terminated in order to avoid misassemblies.
Figure 2:
Figure 2:
Overlap graph of a genome containing a two-copy repeat (B). Note the increased depth of coverage within the repeat. The correct reconstruction of this genome spells the sequence ABCBD, while conservative assembly approaches would lead to a fragmented reconstruction.
Figure 3:
Figure 3:
(A) k-mer spectrum of a DNA string (bold) for k = 4; (B) Section of the corresponding deBruijn graph. The edges are labeled with the corresponding k-mer and (C) Overlap between two reads (bold) that can be inferred from the corresponding paths through the deBruijn graph.

Similar articles

Cited by

References

    1. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA. 1977;74:5463–67. - PMC - PubMed
    1. Wang J, Wang W, Li R, et al. The diploid genome sequence of an Asian individual. Nature. 2008;456:60–65. - PMC - PubMed
    1. Levy S, Sutton G, Ng PC, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5:e254. - PMC - PubMed
    1. Wheeler DA, Srinivasan M, Egholm M, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008;452:872–6. - PubMed
    1. Green P. Against a whole-genome shotgun. Genome Res. 1997;7:410–7. - PubMed

Publication types