Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 22;25(1):bbad496.
doi: 10.1093/bib/bbad496.

Structural coverage of the human interactome

Affiliations

Structural coverage of the human interactome

Kayra Kosoglu et al. Brief Bioinform. .

Abstract

Complex biological processes in cells are embedded in the interactome, representing the complete set of protein-protein interactions. Mapping and analyzing the protein structures are essential to fully comprehending these processes' molecular details. Therefore, knowing the structural coverage of the interactome is important to show the current limitations. Structural modeling of protein-protein interactions requires accurate protein structures. In this study, we mapped all experimental structures to the reference human proteome. Later, we found the enrichment in structural coverage when complementary methods such as homology modeling and deep learning (AlphaFold) were included. We then collected the interactions from the literature and databases to form the reference human interactome, resulting in 117 897 non-redundant interactions. When we analyzed the structural coverage of the interactome, we found that the number of experimentally determined protein complex structures is scarce, corresponding to 3.95% of all binary interactions. We also analyzed known and modeled structures to potentially construct the structural interactome with a docking method. Our analysis showed that 12.97% of the interactions from HuRI and 73.62% and 32.94% from the filtered versions of STRING and HIPPIE could potentially be modeled with high structural coverage or accuracy, respectively. Overall, this paper provides an overview of the current state of structural coverage of the human proteome and interactome.

Keywords: AlphaFold2; PDB; homology modeling databases; human interactome; human proteome; protein complexes; structural coverage.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Concept figure. Each database is represented with the same coloring code in both sections. (A) for reference human proteome. PDB: Protein Data Bank, HM: homology modeling, AF: AlphaFold. Human reference proteome is shown by a long continuous line. Homology models and PDB structures might have overlapping regions that are represented by discrete lines. While AF provides a model for all of the proteome, we focus on regions that are modeled with high accuracy. (B) Sample network representation of the reference interactome. Protein structures are represented by nodes and interactions by edges. Question marks within the nodes show monomers that do not have any known 3D structure in any of the databases. Question marks on the edges show unknown 3D structures of the interactions (complex) between structurally known monomers.
Figure 2
Figure 2
Structural coverage of proteins in the human reference proteome by databases. (A) The number of proteins modeled by PDB, SWISS-MODEL, ModBase, AF and their intersections are visualized. AF models with 85% of their residues predicted with ≥70% pLDDT score are used. (B) The number of proteins with no structure, partial structure, and complete structure. AF models with 85% (AF-85), 70% (AF-70) and 50% (AF-50) of their residues predicted with ≥70% pLDDT score are used for this demonstration. Partial structure denotes structure coverage <80% for PDB and homology models. For AF, it denotes an average accuracy of <80%. Similarly, complete structure means ≥80% coverage or average accuracy. Although it’s not visible, AF-85 has 13 models with partial structures.
Figure 3
Figure 3
Overview of PPIs found in HuRI, BioPlex, HIPPIEF, and STRINGF databases. (A) Interactions are filtered according to the reference proteome. (B) Interactions with a PDB structure for both interacting proteins filtered to the reference proteome. (C) Interactions with a high-quality homology model for both interacting proteins filtered to the reference proteome. (D) Interactions with an AF model that have 85% of their residues predicted with ≥70% pLDDT score for both interacting proteins filtered to the reference proteome. HM: homology modeling, AF: AlphaFold.
Figure 4
Figure 4
Total number of proteins found in STRINGF, HIPPIEF and HuRI databases.
Figure 5
Figure 5
Structural coverage of protein–protein interactions in interactome databases. Exp-exp indicates interactions where both interacting partners have experimental structures from PDB . Exp-model represents interactions where only one interacting partner has an experimental structure from PDB, and the other partner has a model from ModBase, SWISS-MODEL or AF. Model-model indicates that both interacting partners do not have experimental structures but have a model. Lastly, rest is for the remaining interactions that have no structural data available. The inset plot shows the same concept in terms of percentages. AF models that have 85% of their residues predicted with ≥70% pLDDT score are considered.
Figure 6
Figure 6
3D Structure modeling assessment of ‘Protein 1-Protein 2’ pairs in reference interactomes in (A) HuRI, (B) STRINGF and (C) HIPPIEF. xy axes labels show the 3D coverage or accuracy label of the databases. LC: low coverage, HC: high coverage, HA: high accuracy, LA: low accuracy, MB: ModBase, SM: SWISS-MODEL, AF: AlphaFold. The number in each cell indicates how many ‘Protein 1–Protein 2’ pairs of the reference interactome can be potentially constructed by using mentioned sources with given labeled coverage or accuracy.
Figure 7
Figure 7
Investigation of important genes in interactome databases. Fraction of disease-related, COSMIC CGC and drug-targeting genes are investigated for all interactome databases. The term ‘combined’ represents merged interactomes of HIPPIEF, STRINGF and HuRI.

Similar articles

Cited by

References

    1. Shoemaker BA, Panchenko AR. Deciphering protein-protein interactions. Part I. Experimental techniques and databases. PLoS Comput Biol 2007;3(3):e42. 10.1371/journal.pcbi.0030042. - DOI - PMC - PubMed
    1. Garland W, Benezra R, Chaudhary J. Chapter fifteen—targeting protein–protein interactions to treat cancer—recent progress and future directions. In: Desai MC (ed). Annual Reports in Medicinal Chemistry, Vol. 48., Massachusetts: Academic Press, 2013, 227–45.
    1. Meyer MJ, Beltrán JF, Liang S, et al. . Interactome INSIDER: a structural interactome browser for genomic studies. Nat Methods 2018;15(2):107–14. - PMC - PubMed
    1. Mosca R, Céol A, Aloy P. Interactome3D: adding structural details to protein networks. Nat Methods 2013;10(1):47–53. - PubMed
    1. Huttlin EL, Ting L, Bruckner RJ, et al. . The BioPlex network: a systematic exploration of the human interactome. Cell 2015;162(2):425–40. - PMC - PubMed

Publication types