. 2004 Nov 4:5:176.

doi: 10.1186/1471-2105-5-176.

ESTIMA, a tool for EST management in a multi-project environment

Charu G Kumar¹, Richard LeDuc, George Gong, Levan Roinishivili, Harris A Lewin, Lei Liu

Affiliations

PMID: 15527510
PMCID: PMC533868
DOI: 10.1186/1471-2105-5-176

ESTIMA, a tool for EST management in a multi-project environment

Charu G Kumar et al. BMC Bioinformatics. 2004.

. 2004 Nov 4:5:176.

doi: 10.1186/1471-2105-5-176.

Authors

Charu G Kumar¹, Richard LeDuc, George Gong, Levan Roinishivili, Harris A Lewin, Lei Liu

Affiliation

¹ Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. cgkumar@uiuc.edu <cgkumar@uiuc.edu>

PMID: 15527510
PMCID: PMC533868
DOI: 10.1186/1471-2105-5-176

Abstract

Background: Single-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs), and then assembled into contigs representative of putative genes. Usually, to be of value, ESTs and contigs must be associated with meaningful annotations, and made available to end-users.

Results: A web application, Expressed Sequence Tag Information Management and Annotation (ESTIMA), has been created to meet the EST annotation and data management requirements of multiple high-throughput EST sequencing projects. It is anchored on individual ESTs and organized around different properties of ESTs including chromatograms, base-calling quality scores, structure of assembled transcripts, and multiple sources of comparison to infer functional annotation, Gene Ontology associations, and cDNA library information. ESTIMA consists of a relational database schema and a set of interactive query interfaces. These are integrated with a suite of web-based tools that allow a user to query and retrieve information. Further, query results are interconnected among the various EST properties. ESTIMA has several unique features. Users may run their own EST processing pipeline, search against preferred reference genomes, and use any clustering and assembly algorithm. The ESTIMA database schema is very flexible and accepts output from any EST processing and assembly pipeline. ESTIMA has been used for the management of EST projects of many species, including honeybee (Apis mellifera), cattle (Bos taurus), songbird (Taeniopygia guttata), corn rootworm (Diabrotica vergifera), catfish (Ictalurus punctatus, Ictalurus furcatus), and apple (Malus x domestica). The entire resource may be downloaded and used as is, or readily adapted to fit the unique needs of other cDNA sequencing projects.

Conclusions: The scripts used to create the ESTIMA interface are freely available to academic users in an archived format from http://titan.biotec.uiuc.edu/ESTIMA/. The entity-relationship (E-R) diagrams and the programs used to generate the Oracle database tables are also available. We have also provided detailed installation instructions and a tutorial at the same website. Presently the chromatograms, EST databases and their annotations have been made available for cattle and honeybee brain EST projects. Non-academic users need to contact the W.M. Keck Center for Functional and Comparative Genomics, University of Illinois at Urbana-Champaign, Urbana, IL, for licensing information.

PubMed Disclaimer

Figures

**Figure 1**
ESTIMA is organized around three major components. A single installation of the ESTIMA web application can provide a front-end for any number of projects; in this case, three different projects are shown. The web application connects to a different project database for each project. All projects share the GENOME database, and a common repository for the blastable databases, although project users can only "see" those databases associated with their project.

**Figure 2**
This ER diagram shows both the GENOME schema and a single PROJECT schema. In practice, each project schema is given a unique name associated with the organism under study, thus the songbird project information is stored in the "songbird" schema.

**Figure 3**
The seven elements of the web application (the start screen and six query applications shown as rectangles) interact with each other in a complex manner. A single headed arrow means that the element at the tail of the arrow creates hyperlinks in its output that automatically calls the element at the arrowhead. For example, whenever the contig viewer refers to an EST sequence, it links the ID to information about the EST from the Sequence ID element. The GO Browser and the Sequence ID elements allow users to download the appropriate FASTA files. Additionally, the GO Browser and Gene Association elements provide links to external information about reference sequences.

**Figure 4**
A screenshot of the custom GO Browser. The left panel is the query page, and the right panel displays the parent-term tree at the top (not visible), and a child-term tree that indicates the number of ESTs associated with each term. Detailed EST annotation reports may be displayed or downloaded, as also the sequences of these annotated ESTs.

**Figure 5**
An example of the use of ESTIMA in research. The top panel shows the results of a TBLASTX of a mouse brain mRNA similar to human tubulin alpha-1 protein against honeybee brain assembled ESTs. The resulting hit Id, Contig2466, is linked to the Sequence ID interface in ESTIMA from where the consensus sequence of the honeybee contig may be retrieved. The chromatogram for a member EST in the contig is displayed.

See this image and copyright information in PMC

Cited by

Expressed sequences tags of the anther smut fungus, Microbotryum violaceum, identify mating and pathogenicity genes.
Yockteng R, Marthey S, Chiapello H, Gendrault A, Hood ME, Rodolphe F, Devier B, Wincker P, Dossat C, Giraud T. Yockteng R, et al. BMC Genomics. 2007 Aug 10;8:272. doi: 10.1186/1471-2164-8-272. BMC Genomics. 2007. PMID: 17692127 Free PMC article.
Molecular epidemiological investigation of porcine reproductive and respiratory syndrome virus in Northwest China from 2007 to 2010.
Shang Y, Wang G, Tian H, Yin S, Du P, Wu J, Chen Y, Yang S, Jin Y, Zhang K, Liu X. Shang Y, et al. Virus Genes. 2012 Aug;45(1):90-7. doi: 10.1007/s11262-012-0747-4. Epub 2012 Jun 23. Virus Genes. 2012. PMID: 22729801
Transcriptome analysis of the desert locust central nervous system: production and annotation of a Schistocerca gregaria EST database.
Badisco L, Huybrechts J, Simonet G, Verlinden H, Marchal E, Huybrechts R, Schoofs L, De Loof A, Vanden Broeck J. Badisco L, et al. PLoS One. 2011 Mar 21;6(3):e17274. doi: 10.1371/journal.pone.0017274. PLoS One. 2011. PMID: 21445293 Free PMC article.
A Comparative Analysis of the Venom Gland Transcriptomes of the Fishing Spiders Dolomedes mizhoanus and Dolomedes sulfurous.
Xu X, Wang H, Zhang F, Hu Z, Liang S, Liu Z. Xu X, et al. PLoS One. 2015 Oct 7;10(10):e0139908. doi: 10.1371/journal.pone.0139908. eCollection 2015. PLoS One. 2015. PMID: 26445494 Free PMC article.
Design and implementation of a generalized laboratory data model.
Wendl MC, Smith S, Pohl CS, Dooling DJ, Chinwalla AT, Crouse K, Hepler T, Leong S, Carmichael L, Nhan M, Oberkfell BJ, Mardis ER, Hillier LW, Wilson RK. Wendl MC, et al. BMC Bioinformatics. 2007 Sep 26;8:362. doi: 10.1186/1471-2105-8-362. BMC Bioinformatics. 2007. PMID: 17897463 Free PMC article.

See all "Cited by" articles

References

1. Christoffels A, van Gelder A, Greyling G, Miller R, Hide T, Hide W. STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res. 2001;29:234–238. doi: 10.1093/nar/29.1.234. - DOI - PMC - PubMed
1. Paquola AC, Nishyiama MY, Jr, Reis EM, da Silva AM, Verjovski-Almeida S. ESTWeb: bioinformatics services for EST sequencing projects. Bioinformatics. 2003;19:1587–1588. doi: 10.1093/bioinformatics/btg196. - DOI - PubMed
1. Mao C, Cushman JC, May GD, Weller JW. ESTAP – An automated system for the analysis of EST data. Bioinformatics. 2003;19:1720–1722. doi: 10.1093/bioinformatics/btg205. - DOI - PubMed
1. Ayoubi P, Jin X, Leite S, Liu X, Martajaja J, Abduraham A, Wan Q, Yan W, Misawa E, Prade RA. PipeOnline 2.0 automated EST processing and functional data sorting. Nucleic Acids Res. 2002;30:4761–4769. doi: 10.1093/nar/gkf585. - DOI - PMC - PubMed
1. Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B, Pertea G, Sultana R, White J. The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Research. 2001;29:159–164. doi: 10.1093/nar/29.1.159. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ESTIMA, a tool for EST management in a multi-project environment

Affiliation

ESTIMA, a tool for EST management in a multi-project environment

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Research Materials