Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Jun;13(6B):1273-89.
doi: 10.1101/gr.1119703.

Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia

Affiliations

Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia

Piero Carninci et al. Genome Res. 2003 Jun.

Abstract

We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3'-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5' end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5'-end clusters identify regions that are potential promoters for 8637 known genes and 5'-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overall strategy for preparation of cDNA libraries and routing of rearrayed clones to prepare drivers for subtracting new cDNA libraries. From small samples of tissue (i.e., <15 μg of total RNA as template), the resulting cDNA could neither be normalized nor subtracted. When at least 15 μg of starting total RNA was available, cDNA was subtracted with a driver derived from a minilibrary set and nonredundant rearrayed library. For larger tissues, cDNA was prepared from mRNA. In this case, cDNA was also normalized by using an aliquot of the starting mRNA together with the subtraction step. Any newly prepared libraries went through this routine, making libraries prepared at a later time more strongly subtracted than those prepared earlier.
Figure 2
Figure 2
Overall sequencing growth during the course of the project: x axis shows the number of sequences and y the number of 3′ clusters. Vertical lines indicate switches between sequenced libraries. Strong subtraction is intended with RoT larger than 2000 and up to 500. Library number is displayed only when space allows. We highlighted the most productive factors that influenced gene discovery. (Top, left, inset) The internal redundancy of one library and (bottom, left, inset) the number of library specific clusters plus singletons (the gene discovery rate per library). The overall curve is derived by summing many curves as in B. (A, B) The B0 library, 2-cell stage.
Figure 3
Figure 3
In the course of the gene discovery project (top), we monitored the appearance of singletons vs. clusters of various classes of size (middle) and number of libraries in which they appeared (bottom). Arrows 1, 2, and 3 show the date when increasingly larger subtracting drivers, respectively, consisted of 13,500, 37,500, and 126,000 rearrayed clusters, and were introduced for cDNA library subtraction.

References

    1. Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., Olde, B., Moreno, R.F., et al. 1991. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science 252: 1651-1656. - PubMed
    1. Adams, M.D., Kerlavage, A.R., Fleischmann, R.D., Fuldner, R.A., Bult, C.J., Lee, N.H., Kirkness, E.F., Weinstock, K.G., Gocayne, J.D., White, O., et al. 1995. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377: 3-174. - PubMed
    1. Aparicio, S.A. 2000. How to count... human genes. Nat. Genet. 25: 129-130. - PubMed
    1. Bashiardes, S. and Lovett, M. 2001. cDNA detection and analysis. Curr. Opin. Chem. Biol. 5: 15-20. - PubMed
    1. Beaudoing, E., Freier, S., Wyatt, J.R., Claverie, J.M., and Gautheret, D. 2000. Patterns of variant polyadenylation signal usage in human genes. Genome Res. 10: 1001-1010. - PMC - PubMed

WEB SITE REFERENCES

    1. http://genome.gsc.riken.go.jp/; Describes the overall activity of The RIKEN GER Group.
    1. http://www.informatics.jax.org/menus/expression_menu.shtml; Introduces the mouse tissue's classification.
    1. http://genome.gsc.riken.go.jp/READ/; Describes the microarray expression database of the RIKEN GER Group.

Publication types

MeSH terms