Predicting protein function from domain content
- PMID: 18591194
- DOI: 10.1093/bioinformatics/btn312
Predicting protein function from domain content
Erratum in
- Bioinformatics. 2009 May 1;25(9):1214
Abstract
Motivation: Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions.
Results: Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains.
Availability: Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar
Similar articles
-
Functional evaluation of domain-domain interactions and human protein interaction networks.Bioinformatics. 2007 Apr 1;23(7):859-65. doi: 10.1093/bioinformatics/btm012. Bioinformatics. 2007. PMID: 17456608
-
Architecture of basic building blocks in protein and domain structural interaction networks.Bioinformatics. 2005 Apr 15;21(8):1479-86. doi: 10.1093/bioinformatics/bti240. Epub 2004 Dec 21. Bioinformatics. 2005. PMID: 15613386
-
Predicting protein function from sequence and structural data.Curr Opin Struct Biol. 2005 Jun;15(3):275-84. doi: 10.1016/j.sbi.2005.04.003. Curr Opin Struct Biol. 2005. PMID: 15963890 Review.
-
Identification of function-associated loop motifs and application to protein function prediction.Bioinformatics. 2006 Sep 15;22(18):2237-43. doi: 10.1093/bioinformatics/btl382. Epub 2006 Jul 26. Bioinformatics. 2006. PMID: 16870939
-
Potential implications of availability of short amino acid sequences in proteins: an old and new approach to protein decoding and design.Biotechnol Annu Rev. 2008;14:109-41. doi: 10.1016/S1387-2656(08)00004-5. Biotechnol Annu Rev. 2008. PMID: 18606361 Review.
Cited by
-
UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.BMC Genomics. 2009 Sep 2;10:409. doi: 10.1186/1471-2164-10-409. BMC Genomics. 2009. PMID: 19725959 Free PMC article.
-
A tensor-based bi-random walks model for protein function prediction.BMC Bioinformatics. 2022 May 30;23(1):199. doi: 10.1186/s12859-022-04747-2. BMC Bioinformatics. 2022. PMID: 35637427 Free PMC article.
-
Genome-wide computational function prediction of Arabidopsis proteins by integration of multiple data sources.Plant Physiol. 2011 Jan;155(1):271-81. doi: 10.1104/pp.110.162164. Epub 2010 Nov 22. Plant Physiol. 2011. PMID: 21098674 Free PMC article.
-
FAS: assessing the similarity between proteins using multi-layered feature architectures.Bioinformatics. 2023 May 4;39(5):btad226. doi: 10.1093/bioinformatics/btad226. Bioinformatics. 2023. PMID: 37084276 Free PMC article.
-
De novo virulence feature discovery and risk assessment in Klebsiella pneumoniae based on microbial genome vectorization.Commun Biol. 2025 Apr 17;8(1):623. doi: 10.1038/s42003-025-07678-9. Commun Biol. 2025. PMID: 40246993 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials