Review

. 2004 Oct;7(5):535-45.

doi: 10.1016/j.mib.2004.08.012.

Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction

Ronald Jansen¹, Mark Gerstein

Affiliations

PMID: 15451510
DOI: 10.1016/j.mib.2004.08.012

Review

Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction

Ronald Jansen et al. Curr Opin Microbiol. 2004 Oct.

. 2004 Oct;7(5):535-45.

doi: 10.1016/j.mib.2004.08.012.

Authors

Ronald Jansen¹, Mark Gerstein

Affiliation

¹ Computational Biology Center, Memorial Sloan-Kettering Cancer Center, 307 East 63(rd) Street, 2(nd) floor, New York, New York 10021, USA.

PMID: 15451510
DOI: 10.1016/j.mib.2004.08.012

Abstract

The concept of 'protein function' is rather 'fuzzy' because it is often based on whimsical terms or contradictory nomenclature. This currently presents a challenge for functional genomics because precise definitions are essential for most computational approaches. Addressing this challenge, the notion of networks between biological entities (including molecular and genetic interaction networks as well as transcriptional regulatory relationships) potentially provides a unifying language suitable for the systematic description of protein function. Predicting the edges in protein networks requires reference sets of examples with known outcome (that is, 'gold standards'). Such reference sets should ideally include positive examples - as is now widely appreciated - but also, equally importantly, negative ones. Moreover, it is necessary to consider the expected relative occurrence of positives and negatives because this affects the misclassification rates of experiments and computational predictions. For instance, a reason why genome-wide, experimental protein-protein interaction networks have high inaccuracies is that the prior probability of finding interactions (positives) rather than non-interacting protein pairs (negatives) in unbiased screens is very small. These problems can be addressed by constructing well-defined sets of non-interacting proteins from subcellular localization data, which allows computing the probability of interactions based on evidence from multiple datasets.

PubMed Disclaimer

Cited by

Finding function: evaluation methods for functional genomic data.
Myers CL, Barrett DR, Hibbs MA, Huttenhower C, Troyanskaya OG. Myers CL, et al. BMC Genomics. 2006 Jul 25;7:187. doi: 10.1186/1471-2164-7-187. BMC Genomics. 2006. PMID: 16869964 Free PMC article.
Integration of probabilistic functional networks without an external Gold Standard.
James K, Alsobhe A, Cockell SJ, Wipat A, Pocock M. James K, et al. BMC Bioinformatics. 2022 Jul 25;23(1):302. doi: 10.1186/s12859-022-04834-4. BMC Bioinformatics. 2022. PMID: 35879662 Free PMC article.
Predicting co-complexed protein pairs from heterogeneous data.
Qiu J, Noble WS. Qiu J, et al. PLoS Comput Biol. 2008 Apr 18;4(4):e1000054. doi: 10.1371/journal.pcbi.1000054. PLoS Comput Biol. 2008. PMID: 18421371 Free PMC article.
In silico predictions of protein interactions between Zika virus and human host.
Pitta JLLP, Vasconcelos CRDS, Wallau GDL, Campos TL, Rezende AM. Pitta JLLP, et al. PeerJ. 2021 Aug 24;9:e11770. doi: 10.7717/peerj.11770. eCollection 2021. PeerJ. 2021. PMID: 34513323 Free PMC article.
Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information.
Li B, Li X, Li X, Wang L, Lu J, Wang J. Li B, et al. PeerJ. 2025 Jan 30;13:e18863. doi: 10.7717/peerj.18863. eCollection 2025. PeerJ. 2025. PMID: 39897484 Free PMC article.

See all "Cited by" articles

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
- Elsevier Science
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction

Affiliation

Analyzing protein function on a genomic scale: the importance of gold-standard positives and negatives for network prediction

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources