Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jan 1;33(Database issue):D433-7.
doi: 10.1093/nar/gki005.

STRING: known and predicted protein-protein associations, integrated and transferred across organisms

Affiliations

STRING: known and predicted protein-protein associations, integrated and transferred across organisms

Christian von Mering et al. Nucleic Acids Res. .

Abstract

A full description of a protein's function requires knowledge of all partner proteins with which it specifically associates. From a functional perspective, 'association' can mean direct physical binding, but can also mean indirect interaction such as participation in the same metabolic pathway or cellular process. Currently, information about protein association is scattered over a wide variety of resources and model organisms. STRING aims to simplify access to this information by providing a comprehensive, yet quality-controlled collection of protein-protein associations for a large number of organisms. The associations are derived from high-throughput experimental data, from the mining of databases and literature, and from predictions based on genomic context analysis. STRING integrates and ranks these associations by benchmarking them against a common reference set, and presents evidence in a consistent and intuitive web interface. Importantly, the associations are extended beyond the organism in which they were originally described, by automatic transfer to orthologous protein pairs in other organisms, where applicable. STRING currently holds 730,000 proteins in 180 fully sequenced organisms, and is available at http://string.embl.de/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Results from a STRING search. Inserts show partial screen shots from evidence pages, which are accessible from the main result page. Two proteins were used as inputs to the query—one is a subunit from the yeast ATP synthase complex, the other a subunit from the ubiquinol–cytochrome C reductase complex. The number of requested partners was limited to 10 (default settings). STRING reports both proteins to be members of functional modules, which are in turn connected as part of a larger unit. The diversity of evidence types supporting the modules is noted.
Figure 2
Figure 2
Deriving confidence scores for high-throughput interaction data [exemplified here for a dataset of protein complex purifications (22)]. In this case, the relative confidence depends on how often two proteins are pulled down together (a and b), versus how often they are pulled down alone (c and d). A purification is counted twice when one of the partners is the bait (a and d). Raw quality is: Q = log{(Ntogether · Ntotal)/[(Nalone1 + 1) · (Nalone2 + 1)]}.
Figure 3
Figure 3
Transferring association scores between organisms. Initial situation (top): a scored association between two proteins in a source organism—how confidently can it be transferred to a target organism by a postulated association among homologous proteins? Bottom left: in ‘COG-mode’, all proteins in an orthologous group (COG) are considered equivalent. The highest association score between any two proteins in the two COGs is assumed to be valid for all pairs. Bottom right: in ‘protein-mode’, all sequence similarity relations between the two organisms are considered. Associations are transferred fractionally, such that the pair with the highest similarity receives the bulk of the score. The relation is not linear: empiric analysis (not shown) suggests that competing similarity links should be down weighted, relative to the best link, as follows: (i) express similarities as values between zero and one, i.e. normalize by self-hit; (ii) transform similarities using s′ = exp(−k1/s), thereby amplifying their ‘spread’; (iii) re-normalize so that, between the two species, all similarities for a protein family add up to one; (iv) each pair of proteins, A and B in the target species now receives a share of the association score: Starget = Ssource · k2 · s′A · s′B. (optimal values for k1 and k2 were empirically found to be 0.7 for both).

References

    1. Salwinski L., Miller,C.S., Smith,A.J., Pettit,F.K., Bowie,J.U. and Eisenberg,D. (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res., 32, D449–D451. - PMC - PubMed
    1. Bader G.D., Betel,D. and Hogue,C.W. (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res., 31, 248–250. - PMC - PubMed
    1. Hermjakob H., Montecchi-Palazzi,L., Lewington,C., Mudali,S., Kerrien,S., Orchard,S., Vingron,M., Roechert,B., Roepstorff,P., Valencia,A. et al. (2004) IntAct: an open source molecular interaction database. Nucleic Acids Res., 32, D452–D455. - PMC - PubMed
    1. Zanzoni A., Montecchi-Palazzi,L., Quondam,M., Ausiello,G., Helmer-Citterich,M. and Cesareni,G. (2002) MINT: a Molecular INTeraction database. FEBS Lett., 513, 135–140. - PubMed
    1. Kanehisa M., Goto,S., Kawashima,S., Okuno,Y. and Hattori,M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res., 32, D277–D280. - PMC - PubMed

Publication types