Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2008 Jan 7:9:5.
doi: 10.1186/1471-2105-9-5.

EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration

Affiliations
Comparative Study

EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration

Javier Forment et al. BMC Bioinformatics. .

Abstract

Background: Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation.

Results: We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval.

Conclusion: The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at http://bioinf.comav.upv.es/est2uni. This site also provides detailed instructions for installation and configuration of the software package. The code is under active development to incorporate new analyses, methods, and algorithms as they are released by the bioinformatics community.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Diagram of the EST2uni processing pipeline. Pipeline showing the different analyses performed by EST2uni.
Figure 2
Figure 2
Parallelization efficiency using different numbers of computer nodes. Time required to perform the complete analysis of 8,000 ESTs using EST2uni in sequential mode and in parallel mode with 2, 4 or 8 biprocessor computer nodes.
Figure 3
Figure 3
Parallelization efficiency using different numbers of input ESTs. Time required to perform a complete analysis of 10,000, 20,000, 40,000, 80,000, and 160,000 ESTs using EST2uni in parallel mode with 8 biprocessor computer nodes.
Figure 4
Figure 4
A screenshot of the Queries page. Unigenes can be efficiently retrieved by any combination of sequence features and/or annotations.
Figure 5
Figure 5
A screenshot of a page with detailed information about a unigene. (a) Unigene sequence and annotation. (b) Graphical representation of the unigene showing its nucleotide sequence, ESTs assembly, alignment to BLAST hits, predicted ORF-containing region, and functional and structural features. (c) Tables showing detailed information about the BLAST hits and hyperlinks to their respective web sites. (d) Table showing the nucleotide discrepancies among the ESTs in the unigene and their positions.

References

    1. Adams MD, Soares MB, Kerlevage AR, Fields C, Venter JC. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nature Genetics. 1993;4:373–380. doi: 10.1038/ng0893-373. - DOI - PubMed
    1. Ayoubi P, Jin X, Leite S, Liu X, Martajaja J, Abduraham A, Wan Q, Yan W, Misawa E, Prade RA. PipeOnline 2.0: automated EST processing and functional data sorting. Nucleic Acids Research. 2002;30:4761–4769. doi: 10.1093/nar/gkf585. - DOI - PMC - PubMed
    1. Mao C, Cushman JC, May GD, Weller JW. ESTAP – an automated system for the analysis of EST data. Bioinformatics. 2003;19:1720–1722. doi: 10.1093/bioinformatics/btg205. - DOI - PubMed
    1. Parkinson J, Anthony A, Wasmuth J, Schmid R, Hedley A, Blaxter M. PartiGene – Constructing partial genomes. Bioinformatics. 2004;20:1398–1404. doi: 10.1093/bioinformatics/bth101. - DOI - PubMed
    1. Kumar CG, LeDuc R, Gong G, Roinishivili L, Lewin HA, Liu L. ESTIMA, a tool for EST management in a multi-project environment. BMC Bioinformatics. 2004;5:176. doi: 10.1186/1471-2105-5-176. - DOI - PMC - PubMed

Publication types

MeSH terms