Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Nov 1;30(21):4761-9.
doi: 10.1093/nar/gkf585.

PipeOnline 2.0: automated EST processing and functional data sorting

Affiliations

PipeOnline 2.0: automated EST processing and functional data sorting

Patricia Ayoubi et al. Nucleic Acids Res. .

Abstract

Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.

PubMed Disclaimer

Figures

Figure 1
Figure 1
PipeOnline updating schemes. Two types of database updating are currently supported by PipeOnline: (A) addition of new DNA sequence records, ABI, SCF or FASTA, to an existing database by database owners; and (B) update of existing database records with new information from public databases. Automated updating occurs with each monthly update to the protein database from NCBI. New protein sequence entries are first matched with and added to the NCBI protein-function database. Locally maintained databases are automatically compared with the monthly updates, new matches added to the database and e-mail notification is sent to database owners.
Figure 2
Figure 2
Schematic overview of modules used in PipeOnline for large-scale sequence processing. Users upload chromatograms or FASTA DNA sequence files. Each chromatogram file is converted to DNA sequence, edited for vector sequences and assembled into contigs. Following local BLASTX comparisons with GenBank using NCBI BLASTALL, each output is parsed and a MySQL relational database containing each sequence record and blast output is assembled. This portable database is linked to Web forms permitting keyword searching and browsing of records with links to public databases with export features. BLASTX results are sorted by biological function with an output generated in a Web-browsable form (GeneBrowser). Each step is an independent component module linked together with UNIX scripts, and can easily be updated or replaced as needed.
Figure 3
Figure 3
Generation of a functional dictionary of the NCBI protein database and functional sorting of PipeOnline records. (A) A MPW functional dictionary was used to correlate standard functional definitions with records from the NCBI protein database using a combination of word-based and protein alignment matching algorithms to generate an NCBI protein-function database. (B) This NCBI protein-function database is used as a lookup table for functional assignment of protein alignment descriptions found in PipeOnline records to provide functional classification of records and a functional overview for each database.
Figure 4
Figure 4
Distribution of ORF functional assignments of S.cerevisiae by two independent functional classification methods. Yeast ORFs automatically annotated with PipeOnline (closed bars) were compared with the same ORFs manually annotated by the CYGD at MIPS (open bars).

References

    1. Wendl M.C., Dear,S., Hodgson,D. and Hillier,L. (1998) Automated sequence preprocessing in a large-scale sequencing environment. Genome Res., 8, 975–984. - PMC - PubMed
    1. Ewing B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res., 8, 175–185. - PubMed
    1. Altschul S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. - PubMed
    1. Audic S. and Claverie,J.M. (1997) The significance of digital gene expression profiles. Genome Res., 7, 986–995. - PubMed
    1. Claverie J.M. (1996) Effective large-scale sequence similarity searches. Methods Enzymol., 266, 212–227. - PubMed

Publication types