Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Nov;14(11):2235-44.
doi: 10.1101/gr.2648404. Epub 2004 Oct 12.

An intermediate grade of finished genomic sequence suitable for comparative analyses

Affiliations
Comparative Study

An intermediate grade of finished genomic sequence suitable for comparative analyses

Robert W Blakesley et al. Genome Res. 2004 Nov.

Abstract

Although the cost of generating draft-quality genomic sequence continues to decline, refining that sequence by the process of "sequence finishing" remains expensive. Near-perfect finished sequence is an appropriate goal for the human genome and a small set of reference genomes; however, such a high-quality product cannot be cost-justified for large numbers of additional genomes, at least for the foreseeable future. Here we describe the generation and quality of an intermediate grade of finished genomic sequence (termed comparative-grade finished sequence), which is tailored for use in multispecies sequence comparisons. Our analyses indicate that this sequence is very high quality (with the residual gaps and errors mostly falling within repetitive elements) and reflects 99% of the total sequence. Importantly, comparative-grade sequence finishing requires approximately 40-fold less reagents and approximately 10-fold less personnel effort compared to the generation of near-perfect finished sequence, such as that produced for the human genome. Although applied here to finishing sequence derived from individual bacterial artificial chromosome (BAC) clones, one could envision establishing routines for refining sequences emanating from whole-genome shotgun sequencing projects to a similar quality level. Our experience to date demonstrates that comparative-grade sequence finishing represents a practical and affordable option for sequence refinement en route to comparative analyses.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Predicted gene structures using different grades of genomic sequence. The annotated relative positions of exons in the rat CAPZA2 (A), baboon CAV2 (B), and lemur GASZ (C) genes are indicated. In each case, the exon positions predicted by Genscan (Burge and Karlin 1997) using each of the different types of genomic sequence are shown below (generated for BAC clones RP31-188L2 [GenBank no. AC087041], RP41-479B1 [GenBank no. AC084730], and LB2-246N5 [GenBank no. AC123544], respectively). The positions of gaps in the full-shotgun draft and comparative-grade finished sequence are shown as grey boxes. Note that using full-shotgun draft sequence (with unordered contigs separated by stretches of 50 Ns), Genscan incorrectly predicts the positions of a number of exons whose positions are correctly predicted by using comparative-grade finished sequence (with ordered and oriented contigs separated by stretches of 50 Ns) or human-grade finished sequence. There are also cases in which Genscan incorrectly predicts exons using all three types of sequence.
Figure 2.
Figure 2.
Analysis of gaps and errors in comparative-grade finished sequence by simulation studies. The histogram bar heights reflect the total gaps or errors falling within each class of annotated sequence (total repeats [A,D], simple repeats [B,E], and exons [C,F]) for the simulated data sets. The arrows point to the observed values with the generated comparative-grade finished sequence (for additional details, see Methods). Note that the observed low level of sequence errors falling in exons (F) likely reflects the fact that the generation of simulated data sets assumes a uniform distribution of errors across repetitive and nonrepetitive sequence (which in reality is not seen).
Figure 3.
Figure 3.
Costs of generating comparative-grade versus human-grade finished sequence. The estimated average direct time (A; actual “handson” time that a finishing technician worked to finish a BAC sequence), elapsed time (B; interval of time from when a BAC was assigned to a finishing technician to when it was finished to a comparative-grade or human-grade stage), and reagent costs (C) required per BAC to perform comparative-grade and human-grade sequence finishing (starting with full-shotgun draft sequence) is indicated (for details, see text).

Similar articles

Cited by

References

    1. Adams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., Amanatides, P.G., Scherer, S.E., Li, P.W., Hoskins, R.A., Galle, R.F., et al. 2000. The genome sequence of Drosophila melanogaster. Science 287: 2185-2195. - PubMed
    1. Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J., Dehal, P., Christoffels, A., Rash, S., Hoon, S., Smit, A., et al. 2002. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297: 1301-1310. - PubMed
    1. Ashurst, J.L. and Collins, J.E. 2003. Gene annotation: Prediction and testing. Annu. Rev. Genomics Hum. Genet. 4: 69-88. - PubMed
    1. Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers, E.W., Li, P.W., and Eichler, E.E. 2002. Recent segmental duplications in the human genome. Science 297: 1003-1007. - PubMed
    1. Bailey, J.A., Church, D.M., Ventura, M., Rocchi, M., and Eichler, E.E. 2004. Analysis of segmental duplications and genome assembly in the mouse. Genome Res. 14: 789-801. - PMC - PubMed

WEB SITE REFERENCES

    1. www.nisc.nih.gov/data; source of Supplemental data and information.
    1. www.nisc.nih.gov; NIH Intramural Sequencing Center (NISC) home page and source of information for the NISC Comparative Sequencing Program.
    1. www.genome.wustl.edu/Overview/finrulesname.php?G16=1; quality specifications for the finished human genome sequence.
    1. www.ncbi.nlm.nih.gov/HTGS; definitions of different phases of genomic sequence.
    1. www.phrap.org; source of Phred, Phrap, Consed, and Cross_Match software.

Publication types