Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Oct;14(10B):2083-92.
doi: 10.1101/gr.2473704.

Systematic recovery and analysis of full-ORF human cDNA clones

Affiliations

Systematic recovery and analysis of full-ORF human cDNA clones

Agnes Baross et al. Genome Res. 2004 Oct.

Abstract

The Mammalian Gene Collection (MGC) consortium (http://mgc.nci.nih.gov) seeks to establish publicly available collections of full-ORF cDNAs for several organisms of significance to biomedical research, including human. To date over 15,200 human cDNA clones containing full-length open reading frames (ORFs) have been identified via systematic expressed sequence tag (EST) analysis of a diverse set of cDNA libraries; however, further systematic EST analysis is no longer an efficient method for identifying new cDNAs. As part of our involvement in the MGC program, we have developed a scalable method for targeted recovery of cDNA clones to facilitate recovery of genes absent from the MGC collection. First, cDNA is synthesized from various RNAs, followed by polymerase chain reaction (PCR) amplification of transcripts in 96-well plates using gene-specific primer pairs flanking the ORFs. Amplicons are cloned into a sequencing vector, and full-length sequences are obtained. Sequences are processed and assembled using Phred and Phrap, and analyzed using Consed and a number of bioinformatics methods we have developed. Sequences are compared with the Reference Sequence (RefSeq) database, and validation of sequence discrepancies is attempted using other sequence databases including dbEST and dbSNP. Clones with identical sequence to RefSeq or containing only validated changes will become part of the MGC human gene collection. Clones containing novel splice variants or polymorphisms have also been identified. Our approach to clone recovery, applied at large scale, has the potential to recover many and possibly most of the genes absent from the MGC collection.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the targeted clone recovery process. “Wet lab” experimental approaches are shown on white background, and bioinformatics methods are shown on gray background.
Figure 2
Figure 2
Agarose gel electrophoresis of double-stranded cDNA. The sources of RNA are shown at the top of the gels. cDNA was synthesized from 1 μg high-quality mRNA per sample, and 1 μL of the resulting 20 μL cDNA per sample was loaded in the five sample wells of a 1% agarose gel.
Figure 3
Figure 3
Electrophoretic analysis of PCR-amplified ORFs. PCR amplification was performed using lung cDNA template and gene-specific primers for 96 target genes. The results of 48 amplifications are shown here. Ten μL of a 25-μL reaction for each sample was loaded on a 1% agarose gel. Expected-size amplicons of target genes are indicated with arrows.
Figure 4
Figure 4
Agarose gel electrophoresis of EcoRI restriction-digested clones. The gel contains digests of 12 clones from each of eight PCR-amplified ORFs. One μL of plasmid DNA per clone was cut with EcoRI in a 96-well plate and loaded on a 1.2% agarose gel. Due to the difference in spacing between the gel and the multichannel pipetter used for loading, clones for the same gene are located in every fifth well. DNA marker is loaded in every fifth lane.
Figure 5
Figure 5
Estimated numbers of RT-PCR-generated clones required on average to identify at least one acceptable clone of the indicated length (as a function of PCR cycle number). This is based on 1/15,000 error rate of the reverse transcriptase, and 1/50,000 error rate of the high-fidelity DNA polymerase used in the clone acquisition process. n50, n75, n90, and n99 indicate the predicted numbers of clones that need to be sequenced in order to find an acceptable clone with probabilities of 50%, 75%, 90%, and 99%, respectively, based on the above error rates.
Figure 6
Figure 6
Bioinformatics sequence analysis pipeline. Databases used for validating clone sequence versus RefSeq discrepancies are shown on grey background.
Figure 7
Figure 7
Summary of failed rescue attempts from RT-PCR-based clone recovery. Of 107 genes nonrescued to date, 38 were declared failures due to the lack of expected-size PCR amplicons. The cloning process failed for two genes. For 67 genes, clones representing expected-size amplicons were generated. Of these, clone insert sequences of 40 matched a RefSeq sequence other than the targeted gene; 32 of these contained sequences for the correct PCR primers used, whereas the remaining eight did not. Of 27 genes where the clone sequences matched the targeted gene, two failed due to various nonvalidated errors. Eight failed due to technical errors, such as primers amplifying within the ORF. For 17 genes, however, at least half of the clones could not be rescued due to a common unvalidated change. We suggest that clones in the last category may not be true failures, but rather novel splice variants or real polymorphisms that should be considered biologically valid.
Figure 8
Figure 8
Electrophoretic analysis of PCR-amplified ORFs. PCR amplification was performed using brain cDNA template and gene-specific primers for 96 target genes. The results of 15 amplifications are shown here. Ten μL of a 25-μL reaction was loaded in each well on a 1% agarose gel. Expected-size amplicons of target genes are marked with black arrows. Amplicons different from expected size and isolated as potential splice variants are indicated with gray arrows.
Figure 9
Figure 9
Splice variants found for the aurora kinase C (AURKC) gene. Three PCR amplicons that were isolated and cloned yielded four different splice forms. “2” corresponds to the expected gene structure (from Ref-Seq) of seven exons. “1” includes an extra sequence previously known as an intron between exons 6 and 7. Clones generated from PCR amplicon “3” yielded two different splice forms of similar size, one without exon 5, and one without exon 4.

References

    1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403-410. - PubMed
    1. Barnes, W.M. 1994. PCR amplification of up to 35-kb DNA with high fidelity and high yield from λ bacteriophage templates. Proc. Natl. Acad. Sci. 91: 2216-2220. - PMC - PubMed
    1. Birney, E., Andrews, D., Bevan, P., Caccamo, M., Cameron, G., Chen, Y., Clarke, L., Coates, G., Cox, T., Cuff, J., et al. 2004. Ensembl 2004. Nucleic Acids Res. 32: D468-470. - PMC - PubMed
    1. Boguski, M.S., Lowe, T.M., and Tolstoshev, C.M. 1993. dbEST—Database for “expressed sequence tags”. Nat. Genet. 4: 332-333. - PubMed
    1. Butterfield, Y.S., Marra, M.A., Asano, J.K., Chan, S.Y., Guin, R., Krzywinski, M.I., Lee, S.S., MacDonald, K.W., Mathewson, C.A., Olson, T.E., et al. 2002. An efficient strategy for large-scale high-throughput transposon-mediated sequencing of cDNA clones. Nucleic Acids Res. 30: 2460-2468. - PMC - PubMed

WEB SITE REFERENCES

    1. http://genome.ucsc.edu/cgi-bin/hgBlat; Human BLAT Search.
    1. http://mgc.nci.nih.gov; Mammalian Gene Collection.
    1. http://www.broad.mit.edu/cgi-bin/primer/primer3_www.cgi; Primer3.
    1. http://www.ensembl.org; Ensembl.
    1. http://www.ncbi.nlm.nih.gov/dbEST; Expressed Sequence Tags database.

Publication types

Substances