ComPPI: a cellular compartment-specific database for protein-protein interaction network analysis

Daniel V Veres¹, Dávid M Gyurkó¹, Benedek Thaler², Kristóf Z Szalay¹, Dávid Fazekas³, Tamás Korcsmáros⁴, Peter Csermely⁵

Affiliations

¹ Department of Medical Chemistry, Semmelweis University, Budapest, Hungary.
² Department of Medical Chemistry, Semmelweis University, Budapest, Hungary Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, Budapest, Hungary.
³ Department of Genetics, Eötvös Loránd University, Budapest, Hungary.
⁴ Department of Genetics, Eötvös Loránd University, Budapest, Hungary TGAC, The Genome Analysis Centre, Norwich, UK Gut Health and Food Safety Programme, Institute of Food Research, Norwich, UK.
⁵ Department of Medical Chemistry, Semmelweis University, Budapest, Hungary csermely.peter@med.semmelweis-univ.hu.

PMID: 25348397
PMCID: PMC4383876
DOI: 10.1093/nar/gku1007

ComPPI: a cellular compartment-specific database for protein-protein interaction network analysis

Daniel V Veres et al. Nucleic Acids Res. 2015 Jan.

. 2015 Jan;43(Database issue):D485-93.

doi: 10.1093/nar/gku1007. Epub 2014 Oct 27.

Authors

Daniel V Veres¹, Dávid M Gyurkó¹, Benedek Thaler², Kristóf Z Szalay¹, Dávid Fazekas³, Tamás Korcsmáros⁴, Peter Csermely⁵

Affiliations

¹ Department of Medical Chemistry, Semmelweis University, Budapest, Hungary.
² Department of Medical Chemistry, Semmelweis University, Budapest, Hungary Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, Budapest, Hungary.
³ Department of Genetics, Eötvös Loránd University, Budapest, Hungary.
⁴ Department of Genetics, Eötvös Loránd University, Budapest, Hungary TGAC, The Genome Analysis Centre, Norwich, UK Gut Health and Food Safety Programme, Institute of Food Research, Norwich, UK.
⁵ Department of Medical Chemistry, Semmelweis University, Budapest, Hungary csermely.peter@med.semmelweis-univ.hu.

PMID: 25348397
PMCID: PMC4383876
DOI: 10.1093/nar/gku1007

Abstract

Here we present ComPPI, a cellular compartment-specific database of proteins and their interactions enabling an extensive, compartmentalized protein-protein interaction network analysis (URL: http://ComPPI.LinkGroup.hu). ComPPI enables the user to filter biologically unlikely interactions, where the two interacting proteins have no common subcellular localizations and to predict novel properties, such as compartment-specific biological functions. ComPPI is an integrated database covering four species (S. cerevisiae, C. elegans, D. melanogaster and H. sapiens). The compilation of nine protein-protein interaction and eight subcellular localization data sets had four curation steps including a manually built, comprehensive hierarchical structure of >1600 subcellular localizations. ComPPI provides confidence scores for protein subcellular localizations and protein-protein interactions. ComPPI has user-friendly search options for individual proteins giving their subcellular localization, their interactions and the likelihood of their interactions considering the subcellular localization of their interacting partners. Download options of search results, whole-proteomes, organelle-specific interactomes and subcellular localization data are available on its website. Due to its novel features, ComPPI is useful for the analysis of experimental results in biochemistry and molecular biology, as well as for proteome-wide studies in bioinformatics and network science helping cellular biology, medicine and drug design.

PubMed Disclaimer

Figures

**Figure 1.**
Flowchart of ComPPI construction highlighting the four curation steps. Constructing the ComPPI database we first checked the data content of 24 possible input databases for false entries, data inconsistence and compatible data structure in order to minimize the bias in ComPPI coming from the input sources **(1)**. As a consequence we selected nine protein–protein interaction (BioGRID (29), CCSB (30), DiP (31), DroID (26), HPRD (27), IntAct (32), MatrixDB (18), MINT (33) and MIPS (28)) and eight subcellular localization databases (eSLDB (37), GO (19), Human Proteinpedia (34), LOCATE (38), MatrixDB (18), OrganelleDB (39), PA-GOSUB (36) and The Human Protein Atlas (35)) in order to integrate them into the ComPPI data set. The subcellular localization structure was manually annotated creating a hierarchic, non-redundant subcellular localization tree using >1600 GO cellular component terms (19) for the standardization of the different data resolution and naming conventions **(2)**. All input databases were connected to the ComPPI core database with newly built interfaces in order to improve data consistency, to allow easy extensibility with new databases and to incorporate automatic database updates. As part of the curation steps the filtering efficiency of our newly built interfaces were tested on 200 random proteins for every input databases, and the interfaces were accepted only when all the requested false-entries and data content errors were filtered, in order to establish a more reliable content (Supplementary Table S3). During data integration, different protein naming conventions were mapped to the most reliable protein name. In this process we used publicly available mapping tables (UniProt (24) and HPRD (27)). For 30% of protein names we applied manually built mapping tables with the help of online ID cross-reference services (PICR (25) and Synergizer (http://llama.mshri.on.ca/synergizer/translate/)) **(3)**. After data integration Localization and Interaction Scores were calculated (for detailed description see Figure 2). As an illustration we show the example of Figure 2 with two interacting proteins (nodes A and B corresponding to HSP 90-alpha A2 and Survivin, respectively) with shared cytosolic and nuclear localizations (light blue and orange). Node B has an additional membrane (yellow) subcellular localization and an extracellular localization (green). Numbers in the circles of nodes A and B refer to their Localization Scores. The Interaction Score of nodes A and B is 0.99 (see Figure 2 for details). The integrated ComPPI data set was manually revised by six independent experts **(4)**. During the revision two of the six experts tested our database on 200 random proteins each to ensure high-quality control requirements, and searched for exact matches between the entries in the input sources and the ComPPI data set. All the experts searched for false entries, data inconsistency, protein name mapping errors in the downloadable data and tested the operation of the online services as well. After the revision we updated our source databases, their interfaces, the subcellular localization tree and the algorithm generating the downloadable data, in order to acquire all the changes proposed during the tests. As the final result, the webpage http://ComPPI.LinkGroup.hu is available for search and download options in order to extract the biological information in a user-friendly way.

**Figure 2.**
Calculation of the subcellular localization-based ComPPI scores. We illustrate the Localization Score calculation steps on the examples of Heat Shock Protein (HSP) 90-apha A2 and Survivin. HSP 90-alpha A2 has two major subcellular localizations, while Survivin has four (φ_nucleusA, φ_cytoA and φ_{extracellularB}, φ_membraneB, φ_nucleusB, φ_cytoB, respectively). Localizations were manually categorized into major localizations before the calculation (see the text in section ‘Subcellular Localization Structure’ for details). **(A)** A Localization Score (such as φ_cytoA) is calculated for every available major subcellular localization for both HSP 90-alpha A2 and Survivin based on the available localization evidence types and the number of the respective localization data entries (corresponding to p_LocX and V_rec of Equation (1)). The Localization Score calculation uses the optimized localization evidence type weights of 0.8, 0.7 and 0.3 for experimental, predicted or unknown localization evidence types, respectively. (For details of the weight optimization procedure see section ‘Score Optimization’ of the main text and Supplementary Figure S6.) The Localization Score (i.e. the likelihood for the respective protein to belong to a major compartment) is represented by the probabilistic disjunction among the different localization evidence types and the number of ComPPI localization data entries of the respective evidence type (Equation (1)). **(B)** Calculation of the Interaction Score (φ_Int) is based on the Localization Scores of the interacting proteins. First, Compartment-specific Interaction Scores (such as φ_cytoInt) are calculated as pair-wise products of the relevant Localization Scores of the two interacting proteins (HSP 90-alpha A2 and Survivin). The final Interaction Score (φ_Int) is calculated as the probabilistic disjunction of the Compartment-specific Interaction Scores of all major localizations available for the interacting pair of proteins (in the example four major localizations for HSP 90-alpha A2 and Survivin) from the maximal number of six major localizations (Equation (2)).

**Figure 3.**
Advantages of ComPPI subcellular localization structure. The subcellular localization structure of ComPPI is based on a manually curated, non-redundant subcellular localization tree extracted from GO data (19) containing more than 1600 GO cellular component terms (Supplementary Figure S2). On Figure 3 an example of the redundancy in the GO cellular component tree structure is shown, where the ‘nuclear pore’ cellular component can be found under several branches in the tree, such as in the ‘nucleus’ -> ‘nuclear envelope’ -> ‘nuclear pore’ or the ‘membrane’ -> ‘membrane part’ -> ‘intrinsic component of the membrane’ -> ‘integral component of the membrane’ -> ‘pore complex’ pathways (highlighted in red). Because of the need of the mapping of high-resolution subcellular localization data into major cellular components (Supplementary Table S4) a localization tree with a non-redundant structure was built. In our example, it can be seen that with the help of this structure the ‘nuclear pore’ derives unequivocally from the ‘nuclear envelope’ term (highlighted in green).

**Figure 4.**
Advantages of the ComPPI data set to filter biologically unlikely interactions and to predict compartment-specific, new properties and functions. The figure shows the interactions of crotonase (enoyl-CoA hydratase, UniProt ID: P30084), involved in fatty acid catabolism having a mitochondrial localization, and its first neighbours supported with experimental evidence before and after filtering to mitochondrial localization. Interactions with an Interaction Score below 0.80 are shown with dashed lines. On one hand, out of the original 71 neighbours of crotonase only 8 remain as mitochondrial interacting partners with a significantly higher average Interaction Score than the whole first-neighbour network, which highlights the importance of compartment-specific filtering in the detection of high-confidence interactors in a subcellular localization-dependent manner. On the other hand, the blue circle of the upper left side of the figure shows those cytosolic crotonase interacting partners, which are involved in apoptosis, a recently discovered function of crotonase (45–47). Thus, the very same example also reveals a potential new function of crotonase, which partially involves its unexpected cytosolic localization, which was recently verified experimentally (46).

See this image and copyright information in PMC

References

1. Hao N., O'Shea E.K. Signal-dependent dynamics of transcription factor translocation controls gene expression. Nat. Struct. Mol. Biol. 2012;19:31–39. - PMC - PubMed
1. Firth S.M., Baxter R.C. Cellular actions of the insulin-like growth factor binding proteins. Endocr. Rev. 2002;23:824–854. - PubMed
1. Azar W.J., Zivkovic S., Werther G.A., Russo V.C. IGFBP-2 nuclear translocation is mediated by a functional NLS sequence and is essential for its pro-tumorigenic actions in cancer cells. Oncogene. 2014;33:578–588. - PubMed
1. Semenza G.L. Regulation of oxygen homeostasis by hypoxia-inducible factor 1. Physiology (Bethesda) 2009;24:97–106. - PubMed
1. Koh G.C.K.W., Porras P., Aranda B., Hermjakob H., Orchard S.E. Analyzing protein-protein interaction networks. J. Proteome Res. 2012;11:2014–2031. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

Biotechnology and Biological Sciences Research Council/United Kingdom

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- FlyBase
- Saccharomyces Genome Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ComPPI: a cellular compartment-specific database for protein-protein interaction network analysis

Affiliations

ComPPI: a cellular compartment-specific database for protein-protein interaction network analysis

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases