Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Oct;14(10B):2064-9.
doi: 10.1101/gr.2496804.

C. elegans ORFeome version 3.1: increasing the coverage of ORFeome resources with improved gene predictions

Affiliations

C. elegans ORFeome version 3.1: increasing the coverage of ORFeome resources with improved gene predictions

Philippe Lamesch et al. Genome Res. 2004 Oct.

Abstract

The first version of the Caenorhabditis elegans ORFeome cloning project, based on release WS9 of Wormbase (August 1999), provided experimental verifications for approximately 55% of predicted protein-encoding open reading frames (ORFs). The remaining 45% of predicted ORFs could not be cloned, possibly as a result of mispredicted gene boundaries. Since the release of WS9, gene predictions have improved continuously. To test the accuracy of evolving predictions, we attempted to PCR-amplify from a highly representative worm cDNA library and Gateway-clone approximately 4200 ORFs missed earlier and for which new predictions are available in WS100 (May 2003). In this set we successfully cloned 63% of ORFs with supporting experimental data ("touched" ORFs), and 42% of ORFs with no supporting experimental evidence ("untouched" ORFs). Approximately 2000 full-length ORFs were cloned in-frame, 13% of which were corrected in their exon/intron structure relative to WS100 predictions. In total, approximately 12,500 C. elegans ORFs are now available as Gateway Entry clones for various reverse proteomics (ORFeome v3.1). This work illustrates why the cloning of a complete C. elegans ORFeome, and likely the ORFeomes of other multicellular organisms, needs to be an iterative process that requires multiple rounds of experimental validation together with gradually improving gene predictions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The C. elegans genome annotation has evolved between WS9 and WS100. (A) For all ORFs that are missing in v1.1a, those with repredicted starts and/or stops in WS100 were identified. Between WS9 and WS100, 1052 ORFs have been repredicted at the start, 962 at the stop, and 694 at both ends. A total of 6146 ORFs had the exact same start and stop in the two Wormbase releases. We also identified 1524 newly predicted ORFs in WS100. (B) Venn diagram summarizing the classification of the 4232 repredicted and new ORFs based on the experimental data available. The blue oval represents all 4232 ORFs that we attempted to clone. The purple circle contains new predictions of which 713 are touched by ESTs and 811 are untouched. The large orange oval represents all ORFs touched by ESTs. The smaller oval in light yellow shows ORFs touched by OSTs. As no OST data are available for ORFs that we did not clone in ORFeome Version 1 or for newly predicted ORFs, only ORFs that we cloned out-of-frame earlier are touched by OSTs. A small portion (64) of the latter are not touched by any ESTs. Of all 4232 predicted ORFs that we attempted to clone, 34% (626 repredicted and 811 new ORFs) are not experimentally verified (untouched), whereas 66% are touched by ESTs, OSTs, or both.
Figure 2
Figure 2
Cloning success based on the nature of repredictions. (A) Of ORFs cloned in ORFeome Version 3, 57% were repredicted to be shorter and 31% to be extended at one or both ends, whereas 12% of the cloned ORFs have been extended at one end and truncated at the other end. (B) Example of an ORF that was successfully cloned in ORFeome Version 3 after having been truncated at the 3′-end. The exon/intron structures in blue represent the old (K12H6.9WS9) and new (K12H6.9WS100) predictions of K12H6.9. Using primers based on WS100 and sequencing the resulting PCR product, we obtained a sequence trace (black arrow) that aligned to the WS100 prediction, showing a full-length OST (pink) of the exact structure predicted. The translated protein is shown in green, demonstrating that the cloned ORF is, indeed, in-frame. The primer designed for the 3′-end of the WS9 prediction cannot anneal to the coding sequence of the WS100 prediction explaining earlier cloning failure. (C) Example of an ORF that was successfully cloned in ORFeome Version 3 after having been extended at both ends. The 5′-primer based on WS9 is annealing in the middle of an intron in the new predicted gene model, explaining earlier cloning failure.
Figure 3
Figure 3
Internal structure differences observed between WS100 predictions and their aligned OSTs. The structure of 540 ORFs has been corrected, each showing one or more differences compared with the corresponding OSTs. OSTs may have more, fewer, longer, or shorter exons than the prediction as well as additional or missing introns.
Figure 4
Figure 4
Merged genes account for a substantial number of repredictions in previously cloned ORFs. (A) Example of two ORFs that have been repredicted and merged into one longer ORF. In ORFeome Version 1 (upper lane), two pairs of primers were generated for the two predicted ORFs. The black arrows represent a primer pair (mv_F09E8.3) that did not amplify the previously predicted ORF F09E8.3. The green arrows represent primers (mv_F09E8.4) that successfully amplified a truncated version of the merged prediction. Using a new primer pair (mv100_F09E8.3), designed on the merged prediction in WS100 (green arrows, lower lane), this longer ORF was successfully cloned in-frame. (B) We have attempted to clone 324 merged ORFs in ORFeome Version 3 and confirmed former mispredictions of 99 pairs and 28 triplets of ORFs, each merged into one longer prediction in WS100.
Figure 5
Figure 5
The C. elegans ORFeome is an evolving resource. The cloning of a (nearly) complete ORFeome will be an iterative process. At each step, predicted ORFs that are successfully cloned in-frame (+) are added to the ORFeome resource. New attempts to clone ORFs that we cloned out-of-frame (o.o.f.) or that we did not clone (-) in earlier cloning steps are based on new or updated predictions (red box). The first two rounds of cloning, ORFeome Version 1 and Version 3, were based on two “snapshots” of the C. elegans genome annotation, WS9 and WS100, respectively. Further cloning steps will be based on different approaches to repredict ORFs, such as comparative genomics. Our current ORFeome resource, v3.1, contains ∼12,500 cloned ORFs. At this stage, our ORFeome resource contains pools of clones for each predicted gene. We are in the process of generating a new resource, ORFeome v2 (Reboul et al. 2003), in which we isolate individual wild-type clones for all detected splice variants of ORFs cloned in v1.1a.

Similar articles

Cited by

References

    1. Blumenthal, T., Evans, D., Link, C.D., Guffanti, A., Lawson, D., Thierry-Mieg, J., Thierry-Mieg, D., Chiu, W.L., Duke, K., Kiraly, M., et al. 2002. A global analysis of Caenorhabditis elegans operons. Nature 417: 797-798. - PubMed
    1. Burset, M. and Guigo, R. 1996. Evaluation of gene structure prediction programs. Genomics 34: 353-367. - PubMed
    1. The C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: A platform for investigating biology. Science 282: 2012-2018. - PubMed
    1. Cliften, P.F., Hillier, L.W., Fulton, L., Graves, T., Miner, T., Gish, W.R., Waterston, R.H., and Johnston, M. 2001. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 11: 1143-1144. - PubMed
    1. Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., and Johnston, M. 2003. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301: 71-76. - PubMed

WEB SITE REFERENCES

    1. http://elegans.swmed.edu/Announcements/genome_complete.html; The Caenorhabditis elegans WWW server.
    1. http://ftp.genome.washington.edu/cgi-bin/genefinder_req.pl; GeneFinder Web Server.
    1. http://worfdb.dfci.harvard.edu; WorfDB, the central repository of the C. elegans ORFeome.
    1. http://ws100.Wormbase.org; frozen release WS100 of Wormbase.
    1. http://www.ddbj.nig.ac.jp/; DNA Data Bank of Japan.

Publication types