Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Oct;14(10B):2083-92.
doi: 10.1101/gr.2473704.

Systematic recovery and analysis of full-ORF human cDNA clones

Affiliations

Systematic recovery and analysis of full-ORF human cDNA clones

Agnes Baross et al. Genome Res. 2004 Oct.

Abstract

The Mammalian Gene Collection (MGC) consortium (http://mgc.nci.nih.gov) seeks to establish publicly available collections of full-ORF cDNAs for several organisms of significance to biomedical research, including human. To date over 15,200 human cDNA clones containing full-length open reading frames (ORFs) have been identified via systematic expressed sequence tag (EST) analysis of a diverse set of cDNA libraries; however, further systematic EST analysis is no longer an efficient method for identifying new cDNAs. As part of our involvement in the MGC program, we have developed a scalable method for targeted recovery of cDNA clones to facilitate recovery of genes absent from the MGC collection. First, cDNA is synthesized from various RNAs, followed by polymerase chain reaction (PCR) amplification of transcripts in 96-well plates using gene-specific primer pairs flanking the ORFs. Amplicons are cloned into a sequencing vector, and full-length sequences are obtained. Sequences are processed and assembled using Phred and Phrap, and analyzed using Consed and a number of bioinformatics methods we have developed. Sequences are compared with the Reference Sequence (RefSeq) database, and validation of sequence discrepancies is attempted using other sequence databases including dbEST and dbSNP. Clones with identical sequence to RefSeq or containing only validated changes will become part of the MGC human gene collection. Clones containing novel splice variants or polymorphisms have also been identified. Our approach to clone recovery, applied at large scale, has the potential to recover many and possibly most of the genes absent from the MGC collection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the targeted clone recovery process. “Wet lab” experimental approaches are shown on white background, and bioinformatics methods are shown on gray background.
Figure 2
Figure 2
Agarose gel electrophoresis of double-stranded cDNA. The sources of RNA are shown at the top of the gels. cDNA was synthesized from 1 μg high-quality mRNA per sample, and 1 μL of the resulting 20 μL cDNA per sample was loaded in the five sample wells of a 1% agarose gel.
Figure 3
Figure 3
Electrophoretic analysis of PCR-amplified ORFs. PCR amplification was performed using lung cDNA template and gene-specific primers for 96 target genes. The results of 48 amplifications are shown here. Ten μL of a 25-μL reaction for each sample was loaded on a 1% agarose gel. Expected-size amplicons of target genes are indicated with arrows.
Figure 4
Figure 4
Agarose gel electrophoresis of EcoRI restriction-digested clones. The gel contains digests of 12 clones from each of eight PCR-amplified ORFs. One μL of plasmid DNA per clone was cut with EcoRI in a 96-well plate and loaded on a 1.2% agarose gel. Due to the difference in spacing between the gel and the multichannel pipetter used for loading, clones for the same gene are located in every fifth well. DNA marker is loaded in every fifth lane.
Figure 5
Figure 5
Estimated numbers of RT-PCR-generated clones required on average to identify at least one acceptable clone of the indicated length (as a function of PCR cycle number). This is based on 1/15,000 error rate of the reverse transcriptase, and 1/50,000 error rate of the high-fidelity DNA polymerase used in the clone acquisition process. n50, n75, n90, and n99 indicate the predicted numbers of clones that need to be sequenced in order to find an acceptable clone with probabilities of 50%, 75%, 90%, and 99%, respectively, based on the above error rates.
Figure 6
Figure 6
Bioinformatics sequence analysis pipeline. Databases used for validating clone sequence versus RefSeq discrepancies are shown on grey background.
Figure 7
Figure 7
Summary of failed rescue attempts from RT-PCR-based clone recovery. Of 107 genes nonrescued to date, 38 were declared failures due to the lack of expected-size PCR amplicons. The cloning process failed for two genes. For 67 genes, clones representing expected-size amplicons were generated. Of these, clone insert sequences of 40 matched a RefSeq sequence other than the targeted gene; 32 of these contained sequences for the correct PCR primers used, whereas the remaining eight did not. Of 27 genes where the clone sequences matched the targeted gene, two failed due to various nonvalidated errors. Eight failed due to technical errors, such as primers amplifying within the ORF. For 17 genes, however, at least half of the clones could not be rescued due to a common unvalidated change. We suggest that clones in the last category may not be true failures, but rather novel splice variants or real polymorphisms that should be considered biologically valid.
Figure 8
Figure 8
Electrophoretic analysis of PCR-amplified ORFs. PCR amplification was performed using brain cDNA template and gene-specific primers for 96 target genes. The results of 15 amplifications are shown here. Ten μL of a 25-μL reaction was loaded in each well on a 1% agarose gel. Expected-size amplicons of target genes are marked with black arrows. Amplicons different from expected size and isolated as potential splice variants are indicated with gray arrows.
Figure 9
Figure 9
Splice variants found for the aurora kinase C (AURKC) gene. Three PCR amplicons that were isolated and cloned yielded four different splice forms. “2” corresponds to the expected gene structure (from Ref-Seq) of seven exons. “1” includes an extra sequence previously known as an intron between exons 6 and 7. Clones generated from PCR amplicon “3” yielded two different splice forms of similar size, one without exon 5, and one without exon 4.

Similar articles

  • Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).
    Hoskins RA, Stapleton M, George RA, Yu C, Wan KH, Carlson JW, Celniker SE. Hoskins RA, et al. Nucleic Acids Res. 2005 Dec 2;33(21):e185. doi: 10.1093/nar/gni184. Nucleic Acids Res. 2005. PMID: 16326860 Free PMC article.
  • Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences.
    Strausberg RL, Feingold EA, Grouse LH, Derge JG, Klausner RD, Collins FS, Wagner L, Shenmen CM, Schuler GD, Altschul SF, Zeeberg B, Buetow KH, Schaefer CF, Bhat NK, Hopkins RF, Jordan H, Moore T, Max SI, Wang J, Hsieh F, Diatchenko L, Marusina K, Farmer AA, Rubin GM, Hong L, Stapleton M, Soares MB, Bonaldo MF, Casavant TL, Scheetz TE, Brownstein MJ, Usdin TB, Toshiyuki S, Carninci P, Prange C, Raha SS, Loquellano NA, Peters GJ, Abramson RD, Mullahy SJ, Bosak SA, McEwan PJ, McKernan KJ, Malek JA, Gunaratne PH, Richards S, Worley KC, Hale S, Garcia AM, Gay LJ, Hulyk SW, Villalon DK, Muzny DM, Sodergren EJ, Lu X, Gibbs RA, Fahey J, Helton E, Ketteman M, Madan A, Rodrigues S, Sanchez A, Whiting M, Madan A, Young AC, Shevchenko Y, Bouffard GG, Blakesley RW, Touchman JW, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Butterfield YS, Krzywinski MI, Skalska U, Smailus DE, Schnerch A, Schein JE, Jones SJ, Marra MA; Mammalian Gene Collection Program Team. Strausberg RL, et al. Proc Natl Acad Sci U S A. 2002 Dec 24;99(26):16899-903. doi: 10.1073/pnas.242603899. Epub 2002 Dec 11. Proc Natl Acad Sci U S A. 2002. PMID: 12477932 Free PMC article.
  • The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).
    Gerhard DS, Wagner L, Feingold EA, Shenmen CM, Grouse LH, Schuler G, Klein SL, Old S, Rasooly R, Good P, Guyer M, Peck AM, Derge JG, Lipman D, Collins FS, Jang W, Sherry S, Feolo M, Misquitta L, Lee E, Rotmistrovsky K, Greenhut SF, Schaefer CF, Buetow K, Bonner TI, Haussler D, Kent J, Kiekhaus M, Furey T, Brent M, Prange C, Schreiber K, Shapiro N, Bhat NK, Hopkins RF, Hsie F, Driscoll T, Soares MB, Casavant TL, Scheetz TE, Brown-stein MJ, Usdin TB, Toshiyuki S, Carninci P, Piao Y, Dudekula DB, Ko MS, Kawakami K, Suzuki Y, Sugano S, Gruber CE, Smith MR, Simmons B, Moore T, Waterman R, Johnson SL, Ruan Y, Wei CL, Mathavan S, Gunaratne PH, Wu J, Garcia AM, Hulyk SW, Fuh E, Yuan Y, Sneed A, Kowis C, Hodgson A, Muzny DM, McPherson J, Gibbs RA, Fahey J, Helton E, Ketteman M, Madan A, Rodrigues S, Sanchez A, Whiting M, Madari A, Young AC, Wetherby KD, Granite SJ, Kwong PN, Brinkley CP, Pearson RL, Bouffard GG, Blakesly RW, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Butterfield YS, Griffith M, Griffith OL, Krzywinski MI, Liao N, Morin R, Palmquist D, Petrescu AS, Skalska U, Smailus DE, Stott JM, Schnerch A, Schein JE, Jones SJ, Holt RA, Baross A, Marra MA, Clifto… See abstract for full author list ➔ Gerhard DS, et al. Genome Res. 2004 Oct;14(10B):2121-7. doi: 10.1101/gr.2596504. Genome Res. 2004. PMID: 15489334 Free PMC article.
  • From genome to proteome: developing expression clone resources for the human genome.
    Temple G, Lamesch P, Milstein S, Hill DE, Wagner L, Moore T, Vidal M. Temple G, et al. Hum Mol Genet. 2006 Apr 15;15 Spec No 1:R31-43. doi: 10.1093/hmg/ddl048. Hum Mol Genet. 2006. PMID: 16651367 Review.
  • Construction of expression-ready cDNA clones for KIAA genes: manual curation of 330 KIAA cDNA clones.
    Nakajima D, Okazaki N, Yamakawa H, Kikuno R, Ohara O, Nagase T. Nakajima D, et al. DNA Res. 2002 Jun 30;9(3):99-106. doi: 10.1093/dnares/9.3.99. DNA Res. 2002. PMID: 12168954 Review.

Cited by

  • A newly discovered human alpha-globin gene.
    Goh SH, Lee YT, Bhanu NV, Cam MC, Desper R, Martin BM, Moharram R, Gherman RB, Miller JL. Goh SH, et al. Blood. 2005 Aug 15;106(4):1466-72. doi: 10.1182/blood-2005-03-0948. Epub 2005 Apr 26. Blood. 2005. PMID: 15855277 Free PMC article.
  • LongSAGE profiling of nine human embryonic stem cell lines.
    Hirst M, Delaney A, Rogers SA, Schnerch A, Persaud DR, O'Connor MD, Zeng T, Moksa M, Fichter K, Mah D, Go A, Morin RD, Baross A, Zhao Y, Khattra J, Prabhu AL, Pandoh P, McDonald H, Asano J, Dhalla N, Ma K, Lee S, Ally A, Chahal N, Menzies S, Siddiqui A, Holt R, Jones S, Gerhard DS, Thomson JA, Eaves CJ, Marra MA. Hirst M, et al. Genome Biol. 2007;8(6):R113. doi: 10.1186/gb-2007-8-6-r113. Genome Biol. 2007. PMID: 17570852 Free PMC article.
  • Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).
    Hoskins RA, Stapleton M, George RA, Yu C, Wan KH, Carlson JW, Celniker SE. Hoskins RA, et al. Nucleic Acids Res. 2005 Dec 2;33(21):e185. doi: 10.1093/nar/gni184. Nucleic Acids Res. 2005. PMID: 16326860 Free PMC article.
  • The completion of the Mammalian Gene Collection (MGC).
    MGC Project Team; Temple G, Gerhard DS, Rasooly R, Feingold EA, Good PJ, Robinson C, Mandich A, Derge JG, Lewis J, Shoaf D, Collins FS, Jang W, Wagner L, Shenmen CM, Misquitta L, Schaefer CF, Buetow KH, Bonner TI, Yankie L, Ward M, Phan L, Astashyn A, Brown G, Farrell C, Hart J, Landrum M, Maidak BL, Murphy M, Murphy T, Rajput B, Riddick L, Webb D, Weber J, Wu W, Pruitt KD, Maglott D, Siepel A, Brejova B, Diekhans M, Harte R, Baertsch R, Kent J, Haussler D, Brent M, Langton L, Comstock CL, Stevens M, Wei C, van Baren MJ, Salehi-Ashtiani K, Murray RR, Ghamsari L, Mello E, Lin C, Pennacchio C, Schreiber K, Shapiro N, Marsh A, Pardes E, Moore T, Lebeau A, Muratet M, Simmons B, Kloske D, Sieja S, Hudson J, Sethupathy P, Brownstein M, Bhat N, Lazar J, Jacob H, Gruber CE, Smith MR, McPherson J, Garcia AM, Gunaratne PH, Wu J, Muzny D, Gibbs RA, Young AC, Bouffard GG, Blakesley RW, Mullikin J, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Hirst M, Zeng T, Tse K, Moksa M, Deng M, Ma K, Mah D, Pang J, Taylor G, Chuah E, Deng A, Fichter K, Go A, Lee S, Wang J, Griffith M, Morin R, Moore RA, Mayo M, Munro S, Wagner S, Jones SJ, Holt RA, Marra MA, Lu S, Yang S, Hartigan … See abstract for full author list ➔ MGC Project Team, et al. Genome Res. 2009 Dec;19(12):2324-33. doi: 10.1101/gr.095976.109. Epub 2009 Sep 18. Genome Res. 2009. PMID: 19767417 Free PMC article.
  • Targeted discovery of novel human exons by comparative genomics.
    Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S, Lau C, Yu HC, Li J, Roe BA, Green P, Gerhard DS, Temple G, Haussler D, Brent MR. Siepel A, et al. Genome Res. 2007 Dec;17(12):1763-73. doi: 10.1101/gr.7128207. Epub 2007 Nov 7. Genome Res. 2007. PMID: 17989246 Free PMC article.

References

    1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410. - PubMed
    1. Barnes, W.M. 1994. PCR amplification of up to 35-kb DNA with high fidelity and high yield from λ bacteriophage templates. Proc. Natl. Acad. Sci. 91: 2216-2220. - PMC - PubMed
    1. Birney, E., Andrews, D., Bevan, P., Caccamo, M., Cameron, G., Chen, Y., Clarke, L., Coates, G., Cox, T., Cuff, J., et al. 2004. Ensembl 2004. Nucleic Acids Res. 32: D468-470. - PMC - PubMed
    1. Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST—Database for “expressed sequence tags”. Nat. Genet. 4: 332-333. - PubMed
    1. Butterfield, Y.S., Marra, M.A., Asano, J.K., Chan, S.Y., Guin, R., Krzywinski, M.I., Lee, S.S., MacDonald, K.W., Mathewson, C.A., Olson, T.E., et al. 2002. An efficient strategy for large-scale high-throughput transposon-mediated sequencing of cDNA clones. Nucleic Acids Res. 30: 2460-2468. - PMC - PubMed

WEB SITE REFERENCES

    1. http://genome.ucsc.edu/cgi-bin/hgBlat; Human BLAT Search.
    1. http://mgc.nci.nih.gov; Mammalian Gene Collection.
    1. http://www.broad.mit.edu/cgi-bin/primer/primer3_www.cgi; Primer3.
    1. http://www.ensembl.org; Ensembl.
    1. http://www.ncbi.nlm.nih.gov/dbEST; Expressed Sequence Tags database.

Publication types

Substances