Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Jan 1;30(1):328-31.
doi: 10.1093/nar/30.1.328.

DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs

Affiliations

DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs

Yutaka Suzuki et al. Nucleic Acids Res. .

Abstract

Although the information of cDNAs is indispensable for analyzing gene function, most of the cDNA sequences stored in current databases are imperfect in the sense that they lack the precise information of 5' end termini. To overcome this difficulty, we have developed the oligo-capping method to obtain full-length cDNAs, the information of which has been partly deposited in public databases. In this study, we further constructed human cDNA libraries enriched in clones containing the cap structure to systematically explore the 5' end structure of expressed genes. Of approximately 217 402 5' end sequences obtained, 111 382 have been matched to cDNA sequences of known genes (7889 genes) and are presented in our new database, DataBase of Transcriptional Start Sites (DBTSS; http://elmo.ims.u-tokyo.ac.jp/dbtss/). Sequence comparison between our entries and those of a reference sequence database, RefSeq, revealed that 4683 (34%) of RefSeq sequences should be extended towards the 5' ends. We also mapped each sequence on the human draft genome sequence to identify its transcriptional start site, which provides us with more detailed information on distribution patterns of transcriptional start sites and adjacent regulatory regions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histogram of the lengths of elongated sequences. There are 4683 RefSeq genes that are elongated with our data. The vertical axis represents the number of clones while the horizontal axis represents the class of elongated lengths in mRNA level (A) or in genomic level (B).
Figure 2
Figure 2
An example of DBTSS web page for NM_005718. (A) Graphical overview of the multiple TSS and exon–intron structures. The top yellow (ORF regions) and blue (5′- and 3′-UTR regions) boxes demonstrate the RefSeq exons, while Red boxes represent our clones. Lines connecting boxes indicate introns and arrows indicate the TSS of each clone. (B) Closer look at the TSS flanking region. (C) Ref-Full sequence of NM_005718.

References

    1. Lander E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,H. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921. - PubMed
    1. Venter J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J., Sutton,G.G., Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A. et al. (2001) The sequence of the human genome. Science, 291, 1304–1351. - PubMed
    1. Frohman M.A., Dush,M.K. and Martin,G.R. (1988) Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. Proc. Natl Acad. Sci. USA, 85, 8998–9002. - PMC - PubMed
    1. Schaefer B.C. (1995) Revolutions in rapid amplification of cDNA ends: new strategies for polymerase chain reaction cloning of full-length cDNA ends. Anal. Biochem., 227, 255–273. - PubMed
    1. McKnight S.L. and Kingsbury,R. (1982) Transcriptional control signals of a eukaryotic protein-coding gene. Science, 217, 316–324. - PubMed

Publication types

Substances