Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 8;49(D1):D605-D612.
doi: 10.1093/nar/gkaa1074.

The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets

Affiliations

The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets

Damian Szklarczyk et al. Nucleic Acids Res. .

Erratum in

Abstract

Cellular life depends on a complex web of functional associations between biomolecules. Among these associations, protein-protein interactions are particularly important due to their versatility, specificity and adaptability. The STRING database aims to integrate all known and predicted associations between proteins, including both physical interactions as well as functional associations. To achieve this, STRING collects and scores evidence from a number of sources: (i) automated text mining of the scientific literature, (ii) databases of interaction experiments and annotated complexes/pathways, (iii) computational interaction predictions from co-expression and from conserved genomic context and (iv) systematic transfers of interaction evidence from one organism to another. STRING aims for wide coverage; the upcoming version 11.5 of the resource will contain more than 14 000 organisms. In this update paper, we describe changes to the text-mining system, a new scoring-mode for physical interactions, as well as extensive user interface features for customizing, extending and sharing protein networks. In addition, we describe how to query STRING with genome-wide, experimental data, including the automated detection of enriched functionalities and potential biases in the user's query data. The STRING resource is available online, at https://string-db.org/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Example of a user-extended STRING network, adding external information. SARS-CoV-2 proteins, highlighted in blue, have been added to the standard human protein–protein association network in STRING, using the data add-on (‘payload’) mechanism. Virus proteins will automatically appear in the network based on their known associations with host proteins (as imported from the IMEx coronavirus interactome (35)). In addition, host proteins whose expression appears to control SARS-CoV-2 virion entry into cells, as determined in a recent genome-wide CRISPR-screen (36), are highlighted: proteins whose removal causes a drop in virus entry efficiency are highlighted in red; green highlights indicate proteins whose removal enhances virus entry. Proteins without highlights have entered the network based on close associations to the CRISPR screen proteins. The inset describes topological statistics of the network: it is strongly enriched in terms of functional associations, as compared to a random network of similar size.
Figure 2.
Figure 2.
Example of a STRING-report on quantitative trends in a user input. Genome-scale inputs into STRING can be used to search for functional enrichments, but confounders in the data can potentially complicate interpretation. A new STRING feature allows to visualize such confounding trends. Here, STRING was queried with a large set of human proteins, whereby each protein was entered together with its approximate likelihood of being targeted toward the mitochondrion (‘Mito Evidence IMPI score’, from the MitoMiner database (59)). As expected, to rank proteins by their mitochondrial localization likelihood reveals no trend in terms of the GC content of their encoding genes, but noticeable trends in some of the other measures tested. Protein abundance is taken from PaxDB (60), expressed in parts-per-million (log-scale). The ‘nr of publications’ refers to the tagged corpus of the STRING text-mining channel, counting how many publications have been tagged for a given protein with at least one of its known names. The protein size corresponds to the amino-acid length of the canonical isoform expressed at a given gene locus (log-scale).

Similar articles

Cited by

References

    1. Barabasi A.L., Oltvai Z.N.. Network biology: understanding the cell's functional organization. Nat. Rev. Genet. 2004; 5:101–113. - PubMed
    1. Hu J.X., Thomas C.E., Brunak S.. Network biology concepts in complex disease comorbidities. Nat. Rev. Genet. 2016; 17:615–629. - PubMed
    1. Conte F., Fiscon G., Licursi V., Bizzarri D., D’Anto T., Farina L., Paci P.. A paradigm shift in medicine: A comprehensive review of network-based approaches. Biochim. Biophys. Acta Gene Regul. Mech. 2020; 1863:194416. - PubMed
    1. Cowen L., Ideker T., Raphael B.J., Sharan R.. Network propagation: a universal amplifier of genetic associations. Nat. Rev. Genet. 2017; 18:551–562. - PubMed
    1. Tian W., Zhang L.V., Tasan M., Gibbons F.D., King O.D., Park J., Wunderlich Z., Cherry J.M., Roth F.P.. Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function. Genome Biol. 2008; 9(Suppl.1):S7. - PMC - PubMed

Publication types