Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Nov 18:5:178.
doi: 10.1186/1471-2105-5-178.

GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes

Affiliations

GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes

David M A Martin et al. BMC Bioinformatics. .

Abstract

Background: The function of a novel gene product is typically predicted by transitive assignment of annotation from similar sequences. We describe a novel method, GOtcha, for predicting gene product function by annotation with Gene Ontology (GO) terms. GOtcha predicts GO term associations with term-specific probability (P-score) measures of confidence. Term-specific probabilities are a novel feature of GOtcha and allow the identification of conflicts or uncertainty in annotation.

Results: The GOtcha method was applied to the recently sequenced genome for Plasmodium falciparum and six other genomes. GOtcha was compared quantitatively for retrieval of assigned GO terms against direct transitive assignment from the highest scoring annotated BLAST search hit (TOPBLAST). GOtcha exploits information deep into the 'twilight zone' of similarity search matches, making use of much information that is otherwise discarded by more simplistic approaches. At a P-score cutoff of 50%, GOtcha provided 60% better recovery of annotation terms and 20% higher selectivity than annotation with TOPBLAST at an E-value cutoff of 10(-4).

Conclusions: The GOtcha method is a useful tool for genome annotators. It has identified both errors and omissions in the original Plasmodium falciparum annotation and is being adopted by many other genome sequencing projects.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Proportion of original GO annotations recovered versus cutoff for assignment of GO terms. (a) GOtcha (b) top informative BLAST hit (TOPBLAST). For GOtcha the P-score is defined in the text. For TOPBLAST the E-value is the expectancy score for the top annotated sequence match.Key: ○ Arabidopsis thaliana; △ Drosophila melanogaster; □ Homo sapiens; ● Plasmodium falciparum; ■ Vibrio cholerae; ◇ Caenorhabditis elegans; ▽ Saccharomyces cerevisiae.
Figure 2
Figure 2
Annotations and sequences annotated. Number of GO term associations made by (a) GOtcha with a P-score over the cutoff and (b) TOPBLAST with an E-value below the cutoff. Number of sequences with an associated annotation predicted by (c) GOtcha with a P-score over the cutoff and (d) TOPBLAST with an E-value below the cutoff. P-score iscalculated to 1 percentage point resolution giving rise to the stepped nature of the graph. Mean number of annotations per annotated sequence predicted by (e) GOtcha and (f) TOPBLAST. Key: ○ Arabidopsis thaliana; △ Drosophila melanogaster; □ Homo sapiens; ● Plasmodium falciparum; ■ Vibrio cholerae; ◇ Caenorhabditis elegans; ▽ Saccharomyces cerevisiae.
Figure 3
Figure 3
Selectivity versus cutoff for assignment of GO terms using all evidence codes. (a) GOtcha with P-score cutoff (a) TOPBLAST with E-value cutoff. Key: ○ Arabidopsis thaliana; △ Drosophila melanogaster; □ Homo sapiens; ● Plasmodium falciparum; ■ Vibrio cholerae; ◇ Caenorhabditis elegans; ▽ Saccharomyces cerevisiae.
Figure 4
Figure 4
Coverage vs cutoff for assignment of GO terms excluding IEA evidence codes. (a) GOtcha (b) top informative BLAST hit. Key: ○ Arabidopsis thaliana; △ Drosophila melanogaster; □ Homo sapiens; ● Plasmodium falciparum; ■ Vibrio cholerae; ◇ Caenorhabditis elegans; ▽ Saccharomyces cerevisiae.
Figure 5
Figure 5
Selectivity versus cutoff for assignment of GO terms excluding IEA evidence codes. (a) GOtcha with P-score cutoff (a) TOPBLAST with E-value cutoff. Key: ○ Arabidopsis thaliana; △ Drosophila melanogaster; □ Homo sapiens; ● Plasmodium falciparum; ■ Vibrio cholerae; ◇ Caenorhabditis elegans; ▽ Saccharomyces cerevisiae.
Figure 6
Figure 6
Relative Error Quotient (REQ) vs cutoff for assignment of GO terms. REQ is defined in the text. (a). GOtcha analysis. (b). Top informative BLAST hit analysis. Key: ○ Arabidopsis thaliana; △ Drosophila melanogaster; □ Homo sapiens; ● Plasmodium falciparum; ■ Vibrio cholerae; ◇ Caenorhabditis elegans; ▽ Saccharomyces cerevisiae.
Figure 7
Figure 7
Relative Error Quotient (REQ) vs cutoff for assignment of GO terms. REQ is defined in the text. IEA terms were excluded from this analysis. (a). GOtcha analysis. (b). Top informative BLAST hit analysis. Key: ○ Arabidopsis thaliana; △ Drosophila melanogaster; □ Homo sapiens; ● Plasmodium falciparum; ■ Vibrio cholerae; ◇ Caenorhabditis elegans; ▽ Saccharomyces cerevisiae.
Figure 8
Figure 8
The effect of different weights on REQ. The REQ for GOtcha predictions of GO term associations for the human proteome was calculated with weighting factors of 0.5 (open circle), 1, 2,3,4, 5, 7, 10 and 15 (cross).
Figure 9
Figure 9
The GOtcha method. 1. A query sequence is subjected to a database search. The search results are processed to give a list of pairwise matches with associated R-scores. 2. The R-score for the pairwise match is added to the total score for each GO term associated with that match sequence. 3. The C-score is calculated as the natural logarithm of the total score at the root node. The I-score for each node is calculated as the ratio of the total node score to the root node.

Similar articles

Cited by

References

    1. Gerlt J, Babbit P. Can Sequence Determine Function? Genome Biology. 2000;1:reviews0005.1–0005.10. doi: 10.1186/gb-2000-1-5-reviews0005. - DOI - PMC - PubMed
    1. Frishman D, Albermann K, Hani J, Heumann K, Metanomski A, Zollner A, Mewes HW. Functional and structural genomics using PEDANT. Bioinformatics. 2001;17:44–57. doi: 10.1093/bioinformatics/17.1.44. - DOI - PubMed
    1. Andrade M, Brown N, Leroy C, Hoersch S, de Daruvar A, Reich C, Franchini A, Tamanes J, Valencia A, Ouzounis C, Sander C. Automated genome sequence analysis and annotation. Bioinformatics. 1999;15:391–412. doi: 10.1093/bioinformatics/15.5.391. - DOI - PubMed
    1. Ouzounis C, Karp P. The Past, Present and Future of Genome-Wide Re-Annotation. Genome Biology. 2002;3:Comment2000.1–2001.6. doi: 10.1186/gb-2002-3-2-comment2001. - DOI - PMC - PubMed
    1. Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, Davis A, Dolinski K, Dwight S, Eppig J, Harris M, Hill D, Issel-Tarver L, Kasarkis A, Lewis S, Matese J, Richardson J, Ringwald M, Rubin G, Sherlock G. Gene Ontology: Tool for the Unification of Biology. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed

Publication types