Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Sep 30:9:405.
doi: 10.1186/1471-2105-9-405.

iRefIndex: a consolidated protein interaction database with provenance

Affiliations

iRefIndex: a consolidated protein interaction database with provenance

Sabry Razick et al. BMC Bioinformatics. .

Abstract

Background: Interaction data for a given protein may be spread across multiple databases. We set out to create a unifying index that would facilitate searching for these data and that would group together redundant interaction data while recording the methods used to perform this grouping.

Results: We present a method to generate a key for a protein interaction record and a key for each participant protein. These keys may be generated by anyone using only the primary sequence of the proteins, their taxonomy identifiers and the Secure Hash Algorithm. Two interaction records will have identical keys if they refer to the same set of identical protein sequences and taxonomy identifiers. We define records with identical keys as a redundant group. Our method required that we map protein database references found in interaction records to current protein sequence records. Operations performed during this mapping are described by a mapping score that may provide valuable feedback to source interaction databases on problematic references that are malformed, deprecated, ambiguous or unfound. Keys for protein participants allow for retrieval of interaction information independent of the protein references used in the original records.

Conclusion: We have applied our method to protein interaction records from BIND, BioGrid, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. The resulting interaction reference index is provided in PSI-MITAB 2.5 format at http://irefindex.uio.no. This index may form the basis of alternative redundant groupings based on gene identifiers or near sequence identity groupings.

PubMed Disclaimer

Figures

Figure 1
Figure 1
ROG's and RIG's. The two black circles represent redundant object groups (ROG's). Each ROG contains a set of protein sequence accessions that point to records describing the exact same sequence from the same organism. The dotted circle represents a redundant interaction group (RIG). This RIG contains a set of protein interaction accessions that point to records describing interactions between the same two proteins (ROG's). Unique identifiers for ROG's and RIG's can be calculated independently by anyone using the primary sequences of the proteins, their taxonomy identifiers and the SHA-1 algorithm.
Figure 2
Figure 2
Overview of the ROG assignment process. Logical blocks (a-i) are described in the text. Only the assignment scores shown in this diagram are reachable by the workflow. Underlined scores were obtained for the present build of the database. A key explaining assignment score features is given in Table 2.
Figure 3
Figure 3
Utility of consolidated interaction data. The black node represents a complex record describing the alternative Ctf18-Dcc1-Ctf8-replication factor C complex (see HPRD complexes for EntrezGene 63922 and [42]). Open circles represent proteins. Dashed red lines represent membership of a protein in the complex. Solid lines represent binary protein-protein interactions that are found only in HPRD (thick black lines), only in OPHID (thick grey lines), or in more than two databases (thin black lines). The clustering coefficient of the complex is 0.57.
Figure 4
Figure 4
Distribution of PMID re-use amongst interaction records. Distinct interactions are binned according to "lowest pmid re-use" (lpr). Bars indicate the number of distinct interactions (RIG's) with a given lpr. The body contains a long distribution of interactions with lpr values between 25 and 2050 (omitted and indicated by the vertical line). Not all lpr bins are represented after lpr of 25. The bins of interactions at lpr 11781 and 36744 (arrows) correspond to yeast two-hybrid studies of the Campylobacter jejuni [71] and Drosophila melanogaster [72] interactomes respectively.

References

    1. Shoemaker BA, Panchenko AR. Deciphering protein-protein interactions. Part I. Experimental techniques and databases. PLoS computational biology. 2007;3:e42. doi: 10.1371/journal.pcbi.0030042. - DOI - PMC - PubMed
    1. Shoemaker BA, Panchenko AR. Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS computational biology. 2007;3:e43. doi: 10.1371/journal.pcbi.0030043. - DOI - PMC - PubMed
    1. IMEx http://imex.sourceforge.net/
    1. Kerrien S, Orchard S, Montecchi-Palazzi L, Aranda B, Quinn AF, Vinod N, Bader GD, Xenarios I, Wojcik J, Sherman D, Tyers M, Salama JJ, Moore S, Ceol A, Chatr-Aryamontri A, Oesterheld M, Stumpflen V, Salwinski L, Nerothin J, Cerami E, Cusick ME, Vidal M, Gilson M, Armstrong J, Woollard P, Hogue C, Eisenberg D, Cesareni G, Apweiler R, Hermjakob H. Broadening the horizon–level 2.5 of the HUPO-PSI format for molecular interactions. BMC biology. 2007;5:44. doi: 10.1186/1741-7007-5-44. - DOI - PMC - PubMed
    1. INSDC: International Nucleotide Sequence Database Collaboration http://www.insdc.org