Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction
- PMID: 15451510
- DOI: 10.1016/j.mib.2004.08.012
Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction
Abstract
The concept of 'protein function' is rather 'fuzzy' because it is often based on whimsical terms or contradictory nomenclature. This currently presents a challenge for functional genomics because precise definitions are essential for most computational approaches. Addressing this challenge, the notion of networks between biological entities (including molecular and genetic interaction networks as well as transcriptional regulatory relationships) potentially provides a unifying language suitable for the systematic description of protein function. Predicting the edges in protein networks requires reference sets of examples with known outcome (that is, 'gold standards'). Such reference sets should ideally include positive examples - as is now widely appreciated - but also, equally importantly, negative ones. Moreover, it is necessary to consider the expected relative occurrence of positives and negatives because this affects the misclassification rates of experiments and computational predictions. For instance, a reason why genome-wide, experimental protein-protein interaction networks have high inaccuracies is that the prior probability of finding interactions (positives) rather than non-interacting protein pairs (negatives) in unbiased screens is very small. These problems can be addressed by constructing well-defined sets of non-interacting proteins from subcellular localization data, which allows computing the probability of interactions based on evidence from multiple datasets.
Similar articles
-
AVID: an integrative framework for discovering functional relationships among proteins.BMC Bioinformatics. 2005 Jun 1;6:136. doi: 10.1186/1471-2105-6-136. BMC Bioinformatics. 2005. PMID: 15929793 Free PMC article.
-
Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps.Bioinformatics. 2005 Jun;21 Suppl 1:i302-10. doi: 10.1093/bioinformatics/bti1054. Bioinformatics. 2005. PMID: 15961472
-
MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data.BMC Syst Biol. 2018 Dec 14;12(Suppl 7):115. doi: 10.1186/s12918-018-0635-1. BMC Syst Biol. 2018. PMID: 30547796 Free PMC article.
-
Integrating a functional proteomic approach into the target discovery process.Biochimie. 2004 Sep-Oct;86(9-10):625-32. doi: 10.1016/j.biochi.2004.09.014. Biochimie. 2004. PMID: 15556272 Review.
-
Prediction and integration of regulatory and protein-protein interactions.Methods Mol Biol. 2009;541:101-43. doi: 10.1007/978-1-59745-243-4_6. Methods Mol Biol. 2009. PMID: 19381527 Review.
Cited by
-
Finding function: evaluation methods for functional genomic data.BMC Genomics. 2006 Jul 25;7:187. doi: 10.1186/1471-2164-7-187. BMC Genomics. 2006. PMID: 16869964 Free PMC article.
-
Integration of probabilistic functional networks without an external Gold Standard.BMC Bioinformatics. 2022 Jul 25;23(1):302. doi: 10.1186/s12859-022-04834-4. BMC Bioinformatics. 2022. PMID: 35879662 Free PMC article.
-
Predicting co-complexed protein pairs from heterogeneous data.PLoS Comput Biol. 2008 Apr 18;4(4):e1000054. doi: 10.1371/journal.pcbi.1000054. PLoS Comput Biol. 2008. PMID: 18421371 Free PMC article.
-
In silico predictions of protein interactions between Zika virus and human host.PeerJ. 2021 Aug 24;9:e11770. doi: 10.7717/peerj.11770. eCollection 2021. PeerJ. 2021. PMID: 34513323 Free PMC article.
-
Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information.PeerJ. 2025 Jan 30;13:e18863. doi: 10.7717/peerj.18863. eCollection 2025. PeerJ. 2025. PMID: 39897484 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources