Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009;10(8):R88.
doi: 10.1186/gb-2009-10-8-r88. Epub 2009 Aug 27.

Assisted assembly: how to improve a de novo genome assembly by using related species

Affiliations

Assisted assembly: how to improve a de novo genome assembly by using related species

Sante Gnerre et al. Genome Biol. 2009.

Abstract

We describe a new assembly algorithm, where a genome assembly with low sequence coverage, either throughout the genome or locally, due to cloning bias, is considerably improved through an assisting process via a related genome. We show that the information provided by aligning the whole-genome shotgun reads of the target against a reference genome can be used to substantially improve the quality of the resulting assembly.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Assisted assembly principle. (a) In this example, five reads align uniquely to the reference genome, and the two leftmost of these (purple) also appear as the two rightmost reads in an existing de novo contig. We can then extend the de novo contig by using the three unassembled reads (green), even if there is no supporting linking evidence (in general, ARACHNE requires a read to be linked to the contig it overlaps before using it to extend the contig). (b) Two scaffolds (blue and purple) are mapped and oriented on the reference genome by the trusted green reads. Furthermore, the two scaffolds are joined by a single link (black dotted line), although this is not trusted per se. The ARACHNE scaffolding algorithm would not normally join the two scaffolds; however, in this case the separation of the two scaffolds implied by the link is consistent with the separation implied by the mapping on the reference genome, and we thus implicitly validate the black dotted link and join the two scaffolds. (c) Trusted read placements anchor portions of a single scaffold onto two distant parts of the reference genome, suggesting either a bona fide syntenic break or a misassembly. To test for the latter, the contested region on the scaffold is subject to a stringent test for misassembly, and broken if it fails. The same level of stringency of misassembly testing could not be applied to the entire assembly because, at low coverage, there would be too many false positives.
Figure 2
Figure 2
Validation test. From the target assembly, we randomly select a pair of high-quality k-mers at distance d from each other. The pair is declared valid if the two k-mers are both present in the reference genome, with the same orientation and a separation d', approximately equal to d. This operation is repeated for many pairs. We report the fraction of such pairs that are valid.

References

    1. Lander ES, Waterman MS. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics. 1988;2:231–239. doi: 10.1016/0888-7543(88)90007-9. - DOI - PubMed
    1. Mouse Genome Sequencing Consortium. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, et al. Initial sequencing and analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262. - DOI - PubMed
    1. Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, Clamp M, Chang JL, Kulbokas EJ, 3rd, Zody MC, Mauceli E, Xie X, Breen M, Wayne RK, Ostrander EA, Ponting CP, Galibert F, Smith DR, DeJong PJ, Kirkness E, Alvarez P, Biagi T, Brockman W, Butler J, Chin CW, Cook A, Cuff J, Daly MJ, DeCaprio D, Gnerre S, et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005;438:803–819. doi: 10.1038/nature04338. - DOI - PubMed
    1. Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Duke S, Garber M, Gentles AJ, Goodstadt L, Heger A, Jurka J, Kamal M, Mauceli E, Searle SM, Sharpe T, Baker ML, Batzer MA, Benos PV, Belov K, Clamp M, Cook A, Cuff J, Das R, Davidow L, Deakin JE, Fazzari MJ, Glass JL, Grabherr M, Greally JM, Gu W, et al. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature. 2007;447:167–177. doi: 10.1038/nature05805. - DOI - PubMed
    1. Margulies EH, NISC Comparative Sequencing Program. Maduro VV, Thomas PJ, Tomkins JP, Amemiya CT, Luo M, Green D. Comparative sequencing provides insights about the structure and conservation of marsupial and monotreme genomes. Proc Natl Acad Sci USA. 2005;102:3354–3359. doi: 10.1073/pnas.0408539102. - DOI - PMC - PubMed

Publication types

LinkOut - more resources