Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Sep;11(9):1541-8.
doi: 10.1101/gr.183201.

Assembly of the working draft of the human genome with GigAssembler

Affiliations

Assembly of the working draft of the human genome with GigAssembler

W J Kent et al. Genome Res. 2001 Sep.

Abstract

The data for the public working draft of the human genome contains roughly 400,000 initial sequence contigs in approximately 30,000 large insert clones. Many of these initial sequence contigs overlap. A program, GigAssembler, was built to merge them and to order and orient the resulting larger sequence contigs based on mRNA, paired plasmid ends, EST, BAC end pairs, and other information. This program produced the first publicly available assembly of the human genome, a working draft containing roughly 2.7 billion base pairs and covering an estimated 88% of the genome that has been used for several recent studies of the genome. Here we describe the algorithm used by GigAssembler.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Two sequences overlapping end to end. The sequences are represented as dashes. The aligning regions are joined by vertical bars. End-to-end overlap is an extremely strong indication that two sequences should be joined into a contig.
Figure 2
Figure 2
Two sequences with tails. The nonaligning regions on either side can be classified into ‘extensions’ and ‘tails.’ Short tails are fairly common even when two sequences should be joined into a contig because of poor quality sequence near the ends and occasional chimeric reads. Long tails, however, are generally a sign that the alignment is merely due to the sequences sharing a repeating element.
Figure 3
Figure 3
Merging into a raft. A contig (‘raft’) of three sequences: A, B, and C has already been constructed by GigAssembler. The program now examines an alignment between sequence C and a new sequence, D, to see whether D should also be added to the raft. The parts of D marked with +s are compatible with the raft because of the C/D alignment. The program must also check that the parts of D marked with ?s are compatable with the raft by examining other alignments.
Figure 4
Figure 4
Three overlapping draft clones: A, B, and C. Each clone has two initial sequence contigs. Note that initial sequence contigs a1, b1, and a2 overlap as do b2 and c1.
Figure 5
Figure 5
Ordering graph of clone starts and ends. This represents the same clones as in Fig. 4. (As) The start of clone A; (Ae) the end of clone A. Similarly Bs, Be, Cs, and Ce represent the starts and ends of clones B and C.
Figure 6
Figure 6
Ordering graph after adding in rafts. The initial sequence contigs shown in Fig. 4 are merged into rafts where they overlap. This forms three rafts: a1b1a2, b2c1, and c2. These rafts are constrained to lie between the relevant clone ends by the addition of additional ordering edges to the graph shown in Fig. 5.

Comment in

References

    1. Anson E, Myers G. Proc. RECOMB '99, Lyon, France. 1999. Algorithms for whole genome shotgun sequencing; pp. 1–9.
    1. BAC Resource Consortium. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature. 2001;409:953–958. - PMC - PubMed
    1. Bentley DR, Deloukas P, Dunham A, French L, Gregory SG, Humphrey SJ, Mungall AJ, Ross MT, Carter NP, Dunham I, et al. The physical maps for sequencing human chromosomes 1, 6, 9, 10, 13, 20 and X. Nature. 2001;409:942–943. - PubMed
    1. Bock JB, Matern HT, Peden AA, Scheller RH. A genomic perspective on membrane compartment organization. Nature. 2001;409:839–841. - PubMed
    1. Bonfield JK, Smith KF, Staden R. A new DNA sequence assembly program. Nucleic Acids Res. 1995;23:4992–4999. - PMC - PubMed

Publication types