Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Nov 4:5:176.
doi: 10.1186/1471-2105-5-176.

ESTIMA, a tool for EST management in a multi-project environment

Affiliations

ESTIMA, a tool for EST management in a multi-project environment

Charu G Kumar et al. BMC Bioinformatics. .

Abstract

Background: Single-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs), and then assembled into contigs representative of putative genes. Usually, to be of value, ESTs and contigs must be associated with meaningful annotations, and made available to end-users.

Results: A web application, Expressed Sequence Tag Information Management and Annotation (ESTIMA), has been created to meet the EST annotation and data management requirements of multiple high-throughput EST sequencing projects. It is anchored on individual ESTs and organized around different properties of ESTs including chromatograms, base-calling quality scores, structure of assembled transcripts, and multiple sources of comparison to infer functional annotation, Gene Ontology associations, and cDNA library information. ESTIMA consists of a relational database schema and a set of interactive query interfaces. These are integrated with a suite of web-based tools that allow a user to query and retrieve information. Further, query results are interconnected among the various EST properties. ESTIMA has several unique features. Users may run their own EST processing pipeline, search against preferred reference genomes, and use any clustering and assembly algorithm. The ESTIMA database schema is very flexible and accepts output from any EST processing and assembly pipeline. ESTIMA has been used for the management of EST projects of many species, including honeybee (Apis mellifera), cattle (Bos taurus), songbird (Taeniopygia guttata), corn rootworm (Diabrotica vergifera), catfish (Ictalurus punctatus, Ictalurus furcatus), and apple (Malus x domestica). The entire resource may be downloaded and used as is, or readily adapted to fit the unique needs of other cDNA sequencing projects.

Conclusions: The scripts used to create the ESTIMA interface are freely available to academic users in an archived format from http://titan.biotec.uiuc.edu/ESTIMA/. The entity-relationship (E-R) diagrams and the programs used to generate the Oracle database tables are also available. We have also provided detailed installation instructions and a tutorial at the same website. Presently the chromatograms, EST databases and their annotations have been made available for cattle and honeybee brain EST projects. Non-academic users need to contact the W.M. Keck Center for Functional and Comparative Genomics, University of Illinois at Urbana-Champaign, Urbana, IL, for licensing information.

PubMed Disclaimer

Figures

Figure 1
Figure 1
ESTIMA is organized around three major components. A single installation of the ESTIMA web application can provide a front-end for any number of projects; in this case, three different projects are shown. The web application connects to a different project database for each project. All projects share the GENOME database, and a common repository for the blastable databases, although project users can only "see" those databases associated with their project.
Figure 2
Figure 2
This ER diagram shows both the GENOME schema and a single PROJECT schema. In practice, each project schema is given a unique name associated with the organism under study, thus the songbird project information is stored in the "songbird" schema.
Figure 3
Figure 3
The seven elements of the web application (the start screen and six query applications shown as rectangles) interact with each other in a complex manner. A single headed arrow means that the element at the tail of the arrow creates hyperlinks in its output that automatically calls the element at the arrowhead. For example, whenever the contig viewer refers to an EST sequence, it links the ID to information about the EST from the Sequence ID element. The GO Browser and the Sequence ID elements allow users to download the appropriate FASTA files. Additionally, the GO Browser and Gene Association elements provide links to external information about reference sequences.
Figure 4
Figure 4
A screenshot of the custom GO Browser. The left panel is the query page, and the right panel displays the parent-term tree at the top (not visible), and a child-term tree that indicates the number of ESTs associated with each term. Detailed EST annotation reports may be displayed or downloaded, as also the sequences of these annotated ESTs.
Figure 5
Figure 5
An example of the use of ESTIMA in research. The top panel shows the results of a TBLASTX of a mouse brain mRNA similar to human tubulin alpha-1 protein against honeybee brain assembled ESTs. The resulting hit Id, Contig2466, is linked to the Sequence ID interface in ESTIMA from where the consensus sequence of the honeybee contig may be retrieved. The chromatogram for a member EST in the contig is displayed.

Similar articles

Cited by

References

    1. Christoffels A, van Gelder A, Greyling G, Miller R, Hide T, Hide W. STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res. 2001;29:234–238. doi: 10.1093/nar/29.1.234. - DOI - PMC - PubMed
    1. Paquola AC, Nishyiama MY, Jr, Reis EM, da Silva AM, Verjovski-Almeida S. ESTWeb: bioinformatics services for EST sequencing projects. Bioinformatics. 2003;19:1587–1588. doi: 10.1093/bioinformatics/btg196. - DOI - PubMed
    1. Mao C, Cushman JC, May GD, Weller JW. ESTAP – An automated system for the analysis of EST data. Bioinformatics. 2003;19:1720–1722. doi: 10.1093/bioinformatics/btg205. - DOI - PubMed
    1. Ayoubi P, Jin X, Leite S, Liu X, Martajaja J, Abduraham A, Wan Q, Yan W, Misawa E, Prade RA. PipeOnline 2.0 automated EST processing and functional data sorting. Nucleic Acids Res. 2002;30:4761–4769. doi: 10.1093/nar/gkf585. - DOI - PMC - PubMed
    1. Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J. The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Research. 2001;29:159–164. doi: 10.1093/nar/29.1.159. - DOI - PMC - PubMed

Publication types