STRING: known and predicted protein-protein associations, integrated and transferred across organisms

Christian von Mering¹, Lars J Jensen, Berend Snel, Sean D Hooper, Markus Krupp, Mathilde Foglierini, Nelly Jouffre, Martijn A Huynen, Peer Bork

Affiliations

PMID: 15608232
PMCID: PMC539959
DOI: 10.1093/nar/gki005

STRING: known and predicted protein-protein associations, integrated and transferred across organisms

Christian von Mering et al. Nucleic Acids Res. 2005.

. 2005 Jan 1;33(Database issue):D433-7.

doi: 10.1093/nar/gki005.

Authors

Christian von Mering¹, Lars J Jensen, Berend Snel, Sean D Hooper, Markus Krupp, Mathilde Foglierini, Nelly Jouffre, Martijn A Huynen, Peer Bork

Affiliation

¹ European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany. mering@embl-heidelberg.de

PMID: 15608232
PMCID: PMC539959
DOI: 10.1093/nar/gki005

Abstract

A full description of a protein's function requires knowledge of all partner proteins with which it specifically associates. From a functional perspective, 'association' can mean direct physical binding, but can also mean indirect interaction such as participation in the same metabolic pathway or cellular process. Currently, information about protein association is scattered over a wide variety of resources and model organisms. STRING aims to simplify access to this information by providing a comprehensive, yet quality-controlled collection of protein-protein associations for a large number of organisms. The associations are derived from high-throughput experimental data, from the mining of databases and literature, and from predictions based on genomic context analysis. STRING integrates and ranks these associations by benchmarking them against a common reference set, and presents evidence in a consistent and intuitive web interface. Importantly, the associations are extended beyond the organism in which they were originally described, by automatic transfer to orthologous protein pairs in other organisms, where applicable. STRING currently holds 730,000 proteins in 180 fully sequenced organisms, and is available at http://string.embl.de/.

PubMed Disclaimer

Figures

**Figure 1**
Results from a STRING search. Inserts show partial screen shots from evidence pages, which are accessible from the main result page. Two proteins were used as inputs to the query—one is a subunit from the yeast ATP synthase complex, the other a subunit from the ubiquinol–cytochrome C reductase complex. The number of requested partners was limited to 10 (default settings). STRING reports both proteins to be members of functional modules, which are in turn connected as part of a larger unit. The diversity of evidence types supporting the modules is noted.

**Figure 2**
Deriving confidence scores for high-throughput interaction data [exemplified here for a dataset of protein complex purifications (22)]. In this case, the relative confidence depends on how often two proteins are pulled down together (a and b), versus how often they are pulled down alone (c and d). A purification is counted twice when one of the partners is the bait (a and d). Raw quality is: Q = log{(N_together · N_total)/[(N_alone1 + 1) · (N_alone2 + 1)]}.

**Figure 3**
Transferring association scores between organisms. Initial situation (top): a scored association between two proteins in a source organism—how confidently can it be transferred to a target organism by a postulated association among homologous proteins? Bottom left: in ‘COG-mode’, all proteins in an orthologous group (COG) are considered equivalent. The highest association score between any two proteins in the two COGs is assumed to be valid for all pairs. Bottom right: in ‘protein-mode’, all sequence similarity relations between the two organisms are considered. Associations are transferred fractionally, such that the pair with the highest similarity receives the bulk of the score. The relation is not linear: empiric analysis (not shown) suggests that competing similarity links should be down weighted, relative to the best link, as follows: (i) express similarities as values between zero and one, i.e. normalize by self-hit; (ii) transform similarities using s′ = exp(−k₁/s), thereby amplifying their ‘spread’; (iii) re-normalize so that, between the two species, all similarities for a protein family add up to one; (iv) each pair of proteins, A and B in the target species now receives a share of the association score: S_target = S_source · k₂ · s′_A · s′_B. (optimal values for k₁ and k₂ were empirically found to be 0.7 for both).

See this image and copyright information in PMC

References

1. Salwinski L., Miller,C.S., Smith,A.J., Pettit,F.K., Bowie,J.U. and Eisenberg,D. (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res., 32, D449–D451. - PMC - PubMed
1. Bader G.D., Betel,D. and Hogue,C.W. (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res., 31, 248–250. - PMC - PubMed
1. Hermjakob H., Montecchi-Palazzi,L., Lewington,C., Mudali,S., Kerrien,S., Orchard,S., Vingron,M., Roechert,B., Roepstorff,P., Valencia,A. et al. (2004) IntAct: an open source molecular interaction database. Nucleic Acids Res., 32, D452–D455. - PMC - PubMed
1. Zanzoni A., Montecchi-Palazzi,L., Quondam,M., Ausiello,G., Helmer-Citterich,M. and Cesareni,G. (2002) MINT: a Molecular INTeraction database. FEBS Lett., 513, 135–140. - PubMed
1. Kanehisa M., Goto,S., Kawashima,S., Okuno,Y. and Hattori,M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res., 32, D277–D280. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

STRING: known and predicted protein-protein associations, integrated and transferred across organisms

Affiliation

STRING: known and predicted protein-protein associations, integrated and transferred across organisms

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources