Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016:2016:6329217.
doi: 10.1155/2016/6329217. Epub 2016 May 10.

The A, C, G, and T of Genome Assembly

Affiliations
Review

The A, C, G, and T of Genome Assembly

Bilal Wajid et al. Biomed Res Int. 2016.

Abstract

Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart for DNA assembly pipeline. Some commonly used tools are mentioned next to each step [36]. Please refer to [, , –88] for details on the above-mentioned tools.
Figure 2
Figure 2
De novo assembly: reads that overlap each other are shown to align at appropriate places with respect to one another, thereby generating the layout. The layout, in turn, constructs a consensus sequence, simply by basing itself on the majority base call. The above-mentioned framework is called “Overlap-Layout-Consensus.”
Figure 3
Figure 3
Reference assisted assembly: reads align relative to a reference sequence setting up the layout. The layout, in turn, constructs a consensus sequence, simply by basing itself on the majority base call. Please note that the reads do not need to match perfectly with the reference. The example shows a shaded region where the consensus sequence differs from the reference. This working scheme is called “Alignment-Layout-Consensus.”

Similar articles

Cited by

References

    1. Wajid B., Serpedin E. Do it yourself guide to genome assembly. Briefings in Functional Genomics. 2016;15(1):1–9. doi: 10.1093/bfgp/elu042. - DOI - PubMed
    1. Venter J. C., Adams M. D., Myers E. W., et al. The sequence of the human genome. Science. 2001;291(5507):1304–1351. - PubMed
    1. Sanger F., Nicklen S., Coulson A. R., et al. DNA sequencing with chain-terminating inhibitors. Biotechnology. 1992;74(12):5463–5467. - PMC - PubMed
    1. Wheeler D. A., Srinivasan M., Egholm M., et al. The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008;452(7189):872–876. doi: 10.1038/nature06884. - DOI - PubMed
    1. Ahmadian A., Gharizadeh B., Gustafsson A. C., et al. Single-nucleotide polymorphism analysis by pyrosequencing. Analytical Biochemistry. 2000;280(1):103–110. doi: 10.1006/abio.2000.4493. - DOI - PubMed

LinkOut - more resources