Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Feb;11(2):281-9.
doi: 10.1101/gr.gr-1457r.

Computer-based methods for the mouse full-length cDNA encyclopedia: real-time sequence clustering for construction of a nonredundant cDNA library

Affiliations

Computer-based methods for the mouse full-length cDNA encyclopedia: real-time sequence clustering for construction of a nonredundant cDNA library

H Konno et al. Genome Res. 2001 Feb.

Abstract

We developed computer-based methods for constructing a nonredundant mouse full-length cDNA library. Our cDNA library construction process comprises assessment of library quality, sequencing the 3' ends of inserts and clustering, and completing a re-array to generate a nonredundant library from a redundant one. After the cDNA libraries are generated, we sequence the 5' ends of the inserts to check the quality of the library; then we determine the sequencing priority of each library. Selected libraries undergo large-scale sequencing of the 3' ends of the inserts and clustering of the tag sequences. After clustering, the nonredundant library is constructed from the original libraries, which have redundant clones. All libraries, plates, clones, sequences, and clusters are uniquely identified, and all information is saved in the database according to this identifier. At press time, our system has been in place for the past two years; we have clustered 939,725 3' end sequences into 127,385 groups from 227 cDNA libraries/sublibraries (see http://genome.gse.riken.go.jp/).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Number of clusters for various clustering conditions. P = 10E−20, 10E−25, and 10E−30 stand for P = 10−20, 10−25, and 10−30, respectively.
Figure 2
Figure 2
Distribution of priming sites.
Figure 3
Figure 3
Data flow and identifiers (IDs).
Figure 4
Figure 4
Plate bar code and identifier (ID) rules.

References

    1. Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu AB, Olde, Moreno RF, et al. Complementary DNA sequencing: Expressed sequence tags and human genome project. Science. 1991;252:1651–1656. - PubMed
    1. Adams MD, Dubnick M, Kerlavage AR, Moreno R, Kelley JM, Utterback TR, Nagle JW, Fields C, Venter JC. Sequence identification of 2375 human brain genes. Nature. 1992;355:632–634. - PubMed
    1. Adams MD, Kerlavage AR, Fleischmann RD, Fuldner RA, Bult CJ, Lee NH, Kirkness EF, Weinstock KG, Gocayne JD, White O, et al. Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature. 1995;377:3–17. - PubMed
    1. Andrey AM, James WF, Mikhail SG. Frequent alternative splicing of human genes. Genome Res. 1999;9:1288–1293. - PMC - PubMed
    1. Boguski MS, Schuler GD. ESTablishing a human transcript map. Nature Genet. 1995;10:369–371. - PubMed

Publication types

Substances

Associated data

LinkOut - more resources