Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Jul 1;32(Web Server issue):W365-71.
doi: 10.1093/nar/gkh485.

Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations

Affiliations

Proteome Analyst: custom predictions with explanations in a web-based tool for high-throughput proteome annotations

Duane Szafron et al. Nucleic Acids Res. .

Abstract

Proteome Analyst (PA) (http://www.cs.ualberta.ca/~bioinfo/PA/) is a publicly available, high-throughput, web-based system for predicting various properties of each protein in an entire proteome. Using machine-learned classifiers, PA can predict, for example, the GeneQuiz general function and Gene Ontology (GO) molecular function of a protein. In addition, PA is currently the most accurate and most comprehensive system for predicting subcellular localization, the location within a cell where a protein performs its main function. Two other capabilities of PA are notable. First, PA can create a custom classifier to predict a new property, without requiring any programming, based on labeled training data (i.e. a set of examples, each with the correct classification label) provided by a user. PA has been used to create custom classifiers for potassium-ion channel proteins and other general function ontologies. Second, PA provides a sophisticated explanation feature that shows why one prediction is chosen over another. The PA system produces a Naïve Bayes classifier, which is amenable to a graphical and interactive approach to explanations for its predictions; transparent predictions increase the user's confidence in, and understanding of, PA.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The top part of a sample PACard.
Figure 2
Figure 2
Proteins by ontological class (partial screenshot).
Figure 3
Figure 3
Full classifier output for ACEA_ECOLI (partial screenshot).
Figure 4
Figure 4
The feature extraction algorithm for a protein sequence in PA.
Figure 5
Figure 5
The training and prediction phases of classification.
Figure 6
Figure 6
The FASTA-based format of a classifier training file.
Figure 7
Figure 7
Information for a trained classifier (partial screenshot).
Figure 8
Figure 8
Part of the general function prediction (GeneQuiz ontology) Explain page for ACEA_ECOLI.

References

    1. Andrade M.A., Brown,N.P., Leroy,C., Hoersch,S., de Daruvar,A., Reich,C., Franchini,A., Tamames,J., Valencia,A., Ouzounis,C. and Sander,C. (1999) Automated genome sequence analysis and annotation. Bioinformatics, 15, 391–412. - PubMed
    1. Kitson D.H., Badretdinov,A., Zhu,Z.Y., Velikanov,M., Edwards,D.J., Olszewski,K., Szalma,S. and Yan,L. (2002) Functional annotation of proteomic sequences based on consensus of sequence and structural analysis. Brief. Bioinformatics, 3, 32–44. - PubMed
    1. Hubbard T., Barker,D., Birney,E., Cameron,G., Chen,Y., Clark,L., Cox,T., Cuff,J., Curwen,V., Down,T. et al. (2002) The Ensembl genome database project. Nucleic Acids Res., 30, 38–41. - PMC - PubMed
    1. Frishman D., Albermann,K., Hani,J., Heumann,K., Metanomski,A., Zollner,A. and Mewes,H.W. (2001) Functional and structural genomics using PEDANT. Bioinformatics, 17, 44–57. - PubMed
    1. Harris N.L. (1997) Genotator: a workbench for sequence annotation. Genome Res., 7, 754–762. - PMC - PubMed