GOAnnotator: linking protein GO annotations to evidence text
- PMID: 17181854
- PMCID: PMC1769513
- DOI: 10.1186/1747-5333-1-19
GOAnnotator: linking protein GO annotations to evidence text
Abstract
Background: Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators.
Results: In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins.
Conclusion: The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one to achieve high precision, which is crucial for the efficient support of GO curators. GOAnnotator was implemented as a web tool that is freely available at http://xldb.di.fc.ul.pt/rebil/tools/goa/.
Figures


Similar articles
-
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960829 Free PMC article.
-
Evaluation of BioCreAtIvE assessment of task 2.BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960828 Free PMC article.
-
The UniProt-GO Annotation database in 2011.Nucleic Acids Res. 2012 Jan;40(Database issue):D565-70. doi: 10.1093/nar/gkr1048. Epub 2011 Nov 28. Nucleic Acids Res. 2012. PMID: 22123736 Free PMC article.
-
How to learn about gene function: text-mining or ontologies?Methods. 2015 Mar;74:3-15. doi: 10.1016/j.ymeth.2014.07.004. Epub 2014 Aug 1. Methods. 2015. PMID: 25088781 Review.
-
Deep Question Answering for protein annotation.Database (Oxford). 2015 Sep 16;2015:bav081. doi: 10.1093/database/bav081. Print 2015. Database (Oxford). 2015. PMID: 26384372 Free PMC article. Review.
Cited by
-
Combining evidence, specificity, and proximity towards the normalization of Gene Ontology terms in text.EURASIP J Bioinform Syst Biol. 2008;2008(1):342746. doi: 10.1155/2008/342746. EURASIP J Bioinform Syst Biol. 2008. PMID: 18437221 Free PMC article.
-
Teaching computers to read the pharmacogenomics literature ... so you don't have to.Pharmacogenomics. 2010 Apr;11(4):515-8. doi: 10.2217/pgs.10.48. Pharmacogenomics. 2010. PMID: 20350132 Free PMC article. No abstract available.
-
Semantic similarity in biomedical ontologies.PLoS Comput Biol. 2009 Jul;5(7):e1000443. doi: 10.1371/journal.pcbi.1000443. Epub 2009 Jul 31. PLoS Comput Biol. 2009. PMID: 19649320 Free PMC article. Review.
-
NOA: a novel Network Ontology Analysis method.Nucleic Acids Res. 2011 Jul;39(13):e87. doi: 10.1093/nar/gkr251. Epub 2011 May 4. Nucleic Acids Res. 2011. PMID: 21543451 Free PMC article.
-
Integrating protein-protein interactions and text mining for protein function prediction.BMC Bioinformatics. 2008 Jul 22;9 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2105-9-S8-S2. BMC Bioinformatics. 2008. PMID: 18673526 Free PMC article.
References
LinkOut - more resources
Full Text Sources
Other Literature Sources