Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
- PMID: 17620146
- PMCID: PMC1940026
- DOI: 10.1186/1471-2105-8-243
Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
Abstract
Background: Uncovering cellular roles of a protein is a task of tremendous importance and complexity that requires dedicated experimental work as well as often sophisticated data mining and processing tools. Protein functions, often referred to as its annotations, are believed to manifest themselves through topology of the networks of inter-proteins interactions. In particular, there is a growing body of evidence that proteins performing the same function are more likely to interact with each other than with proteins with other functions. However, since functional annotation and protein network topology are often studied separately, the direct relationship between them has not been comprehensively demonstrated. In addition to having the general biological significance, such demonstration would further validate the data extraction and processing methods used to compose protein annotation and protein-protein interactions datasets.
Results: We developed a method for automatic extraction of protein functional annotation from scientific text based on the Natural Language Processing (NLP) technology. For the protein annotation extracted from the entire PubMed, we evaluated the precision and recall rates, and compared the performance of the automatic extraction technology to that of manual curation used in public Gene Ontology (GO) annotation. In the second part of our presentation, we reported a large-scale investigation into the correspondence between communities in the literature-based protein networks and GO annotation groups of functionally related proteins. We found a comprehensive two-way match: proteins within biological annotation groups form significantly denser linked network clusters than expected by chance and, conversely, densely linked network communities exhibit a pronounced non-random overlap with GO groups. We also expanded the publicly available GO biological process annotation using the relations extracted by our NLP technology. An increase in the number and size of GO groups without any noticeable decrease of the link density within the groups indicated that this expansion significantly broadens the public GO annotation without diluting its quality. We revealed that functional GO annotation correlates mostly with clustering in a physical interaction protein network, while its overlap with indirect regulatory network communities is two to three times smaller.
Conclusion: Protein functional annotations extracted by the NLP technology expand and enrich the existing GO annotation system. The GO functional modularity correlates mostly with the clustering in the physical interaction network, suggesting that the essential role of structural organization maintained by these interactions. Reciprocally, clustering of proteins in physical interaction networks can serve as an evidence for their functional similarity.
Figures







Similar articles
-
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960829 Free PMC article.
-
Evaluation of BioCreAtIvE assessment of task 2.BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960828 Free PMC article.
-
Biochemical networks: the evolution of gene annotation.Nat Chem Biol. 2010 Jan;6(1):4-5. doi: 10.1038/nchembio.288. Nat Chem Biol. 2010. PMID: 20016491 Free PMC article.
-
Get ready to GO! A biologist's guide to the Gene Ontology.Brief Bioinform. 2005 Sep;6(3):298-304. doi: 10.1093/bib/6.3.298. Brief Bioinform. 2005. PMID: 16212777 Review.
-
Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae.BMC Microbiol. 2009 Feb 19;9 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2180-9-S1-S8. BMC Microbiol. 2009. PMID: 19278556 Free PMC article. Review.
Cited by
-
Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation.BMC Bioinformatics. 2009 Jul 21;10:228. doi: 10.1186/1471-2105-10-228. BMC Bioinformatics. 2009. PMID: 19622167 Free PMC article.
-
Integrated bio-entity network: a system for biological knowledge discovery.PLoS One. 2011;6(6):e21474. doi: 10.1371/journal.pone.0021474. Epub 2011 Jun 27. PLoS One. 2011. PMID: 21738677 Free PMC article.
-
Clustering based on multiple biological information: approach for predicting protein complexes.IET Syst Biol. 2013 Oct;7(5):223-30. doi: 10.1049/iet-syb.2012.0052. IET Syst Biol. 2013. PMID: 24067423 Free PMC article.
-
Network integration and graph analysis in mammalian molecular systems biology.IET Syst Biol. 2008 Sep;2(5):206-21. doi: 10.1049/iet-syb:20070075. IET Syst Biol. 2008. PMID: 19045817 Free PMC article. Review.
-
Molecular signature and pathway analysis of human primary squamous and adenocarcinoma lung cancers.Am J Cancer Res. 2012;2(1):93-103. Epub 2011 Nov 19. Am J Cancer Res. 2012. PMID: 22206048 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical