Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Aug 21:2025.07.03.663068.
doi: 10.1101/2025.07.03.663068.

Human protein interactome structure prediction at scale with Boltz-2

Affiliations

Human protein interactome structure prediction at scale with Boltz-2

Alexander M Ille et al. bioRxiv. .

Abstract

In humans, protein-protein interactions mediate numerous biological processes and are central to both normal physiology and disease. Extensive research efforts have aimed to elucidate the human protein interactome, and comprehensive databases now catalog these interactions at scale. However, structural coverage of the human protein interactome is limited and remains challenging to resolve through experimental methodology alone. Recent advances in artificial intelligence/machine learning (AI/ML)-based approaches for protein interaction structure prediction present opportunities for large-scale structural characterization of the human interactome. One such model, Boltz-2, which is capable of predicting the structures of protein complexes, may serve this objective. Here, we present de novo computed models of 1,394 binary human protein interaction structures predicted using Boltz-2 based on biochemically determined interaction data sourced from the IntAct database. We assessed the predicted interaction structures through different confidence metrics, which consider both overall structure and the interaction interface. These analyses indicated that prediction confidence tended to be greater for smaller complexes, while increased multiple sequence alignment (MSA) depth tended to improve prediction confidence. Additionally, we examined annotated protein domains and found that 679 of the predicted structural complexes contained a variety of domains with putative interaction involvement on the basis of interaction interface proximity. Furthermore, our analyses revealed intricate interaction networks within the context of biological function and cancer. This work demonstrates the utility of Boltz-2 for in silico structural modeling of the human protein interactome, highlighting both strengths and limitations, while also providing a novel view of broad functional contextualization. Ultimately, such modeling is expected to yield broad structural insights with relevance across multiple domains of biomedical research.

PubMed Disclaimer

Conflict of interest statement

A.M.I. is a founder and partner of North Horizon, which is engaged in the development of artificial intelligence-based software. R.P. and W.A. are founders and equity shareholders of PhageNova Bio. R.P. is Chief Scientific Officer and a paid consultant of PhageNova Bio. R.P. and W.A are founders and equity shareholders of MBrace Therapeutics. R.P. and W.A. serve as paid consultants for MBrace Therapeutics. R.P. and W.A. have Sponsored Research Agreements (SRAs) in place with PhageNova Bio, MBrace Therapeutics, and Alnylam Pharmaceuticals; this study falls outside of the scope of these SRAs. These arrangements are managed in accordance with the established institutional conflict-of-interest policies of Rutgers, The State University of New Jersey. C.M. and S.K.B. declare no competing interests.

Figures

Figure 1.
Figure 1.
(a) Overview of protein-protein interaction structure prediction workflow, leveraging human biochemical interaction data and amino acid sequence data from the IntAct and UniProt databases, respectively, for binary complex prediction with Boltz-2. (b) Boltz-2 structure interaction prediction with the highest confidence score (0.938) among all interaction structures predicted in the current study. Interactor protein A is depicted in teal, interactor protein B is depicted in red, and dashed lines represent hydrogen bonds. The interaction is that of nuclear cap-binding complex protein subunit 1 and protein subunit 2. Insets show examples of predicted interacting residues between the two proteins. (c) Bivariate distribution of sequence lengths of protein interactor pairs, with an upper limit of 1,000 amino acid residues per protein. Data shown as increasing intervals of 25 amino acids, with the scale bar representing number of protein interactor pairs. (d) Co-occurrence matrix of Pfam domains across protein interactor pairs. Red arrowheads indicate the most abundant domain, PF02991 (autophagy protein Atg8 ubiquitin like), n = 215 co-occurrences. Purple arrowhead indicates co-occurrence of domains PF13923 (zinc finger, C3HC4 type (RING finger)) and PF00385 (chromatin organization modifier domain), n = 15 co-occurrences, while blue arrowhead indicates co-occurrence of domains PF16207 (RAWUL domain RING finger- and WD40-associated ubiquitin-like) and PF00385 (chromatin organization modifier domain), n = 15 co-occurrences. Data shown as bins with a minimum of three co-occurrences per bin, with the scale bar representing number of domain co-occurrences.
Figure 2.
Figure 2.
(a) Prediction confidence metric distributions across all protein interaction structures predicted (n = 1,394). All metrics were determined by Boltz-2 on a normalized scale of 0 to 1, enabling direct comparison (see Methods). Solid lines represent medians and dashed lines represent first and third quartiles. (b) Relationship between combined sequence length (the geometric mean of interactor sequence A and interactor sequence B) and confidence score of the predicted interaction structures. Spearman correlation r = −0.261, ****p <0.0001. (c) Relationship between MSA depth and confidence score of the predicted interaction structures. Spearman correlation r = 0.353, ****p <0.0001. Predicted interaction structures with MSA depth greater than 5000 (n = 25 structures) are not graphically depicted here, but were included in statistical analyses for panels c-e. (d) Relationship between MSA depth and entire interaction complex (overall) pLDDT and pTM. Spearman correlation r = 0.369 for overall pLDDT, Spearman correlation r = 0.266 for overall pTM, ****p <0.0001 (e) Relationship between MSA depth and interaction interface-aggregated pLDDT and pTM. Spearman correlation r = 0.331 for interface pLDDT, Spearman correlation r = 0.139 for interface pTM, ****p <0.0001. Lines determined by linear regression are depicted in red in panels b-e. (f-g) Predicted interaction structures which exhibited the greatest overall (f) and interface (g) metrics. Interactor protein A is depicted in teal and interactor protein B is depicted in red. Dashed lines represent hydrogen bonds, dashed lines with an asterisk represent salt bridges, and yellow surfaces represent hydrophobic regions.
Figure 3.
Figure 3.
(a) Chord diagram depicting the interaction network between proteins with a minimum of two other interactors where the interaction confidence score is > 0.6709 (third quartile, Figure 2a). Individual proteins are plotted along the inner circumference of the circle, with differing colors for each protein. Grey connecting lines (chords) represent interactions between proteins. The four outer layers consisting of scalar bars denote (1) overall pLDDT, (2) overall pTM, (3) interface pLDDT, and (4) interface pTM. (b-c) Predicted interaction structures for the two proteins with the greatest number of interactors from (a). This includes n = 14 binary interactions for O95166 (b) and n = 13 binary interactions for Q9GZQ8 (c).
Figure 4.
Figure 4.
(a) Top 10 most prevalent Pfam domains by number of interaction pairs, restricted to only include domains found within 5 Å of the protein-protein interaction interface. These include: autophagy protein Atg8 ubiquitin like domain (PF02991), present in n = 149 interaction structures; protein kinase domain (PF00069), n = 105 interaction structures; Ras family domain (PF00071), n = 54 interaction structures; calcineurin-like phosphoesterase domain (PF00149), n = 53 interaction structures; SH3 domain (PF00018), n = 37 interaction structures; chromatin organization modifier domain (PF00385), n = 35 interaction structures; P53 DNA-binding domain (PF00870), n = 34 interaction structures; ligand-binding domain of nuclear hormone receptor (PF00104), n = 33 interaction structures; ubiquitin family domain (PF00240), n = 33 interaction structures; protein tyrosine and serine/threonine kinase domain (PF07714), n = 33 interaction structures. The quantification of number of interactor pairs containing a given domain is non-redundant, i.e. multiple occurrences of the same domain (either in interactor A or interactor B) do not cumulatively inflate the count. (b) Examples of the top 10 most prevalent domains from (a) as found within predicted interaction structures based on Pfam domain sequence annotation.
Figure 5.
Figure 5.
(a) Co-occurrence matrix of Pfam domains within 5 Å of the interaction interface between protein interactor pairs. Red arrowheads indicate the most abundant domain PF02991 (n = 122 interaction interface co-occurrences), purple arrowhead indicates co-occurrence of domains PF13923 and PF00385 (n = 15 interaction interface co-occurrences), and blue arrowhead indicates co-occurrence of domains PF16207 and PF00385 (n = 15 interaction interface co-occurrences). Data shown as bins with a minimum of three protein interactor pairs per bin, with the scale bar representing number of protein interactor pairs. (b-d) Examples of predicted interaction structures which include co-occurrences of domains PF02991 with PF00027 (b), PF13923 with PF00385 (c), and PF16207 with PF00385 (d). (e) Chord diagrams illustrating domain-based interaction networks grouped by UniProt biological process keyword annotation. For each diagram, individual proteins are plotted along the circumference of the circle and connecting lines (chords) represent interactions with other proteins involving domains < 5 Å of the interaction interface. Differing colors along the circle circumference represent unique proteins while differing colors of the chords represent unique domains. UniProt biological process keywords used for functional grouping include KW-0053 (apoptosis), KW-0132 (cell division), KW-0221 (differentiation), KW-0234 (DNA repair), KW-0391 (immunity), KW-0653 (protein transport), with at least one protein in the interaction pair annotated with the designated keyword required for group inclusion.
Figure 6.
Figure 6.
Chord diagram depicting an oncology-focused interaction network based on domains proximal to the interaction interface. As in Figure 5b, the circumference of the circle contains individual proteins (differing by color) with connecting lines (chords) representing interactions between proteins which involve domains within 5 Å of the interaction interface. Interaction pairs were included on the basis of at least one protein being annotated with the UniProt disease keyword proto-oncogene (KW-0656). The red outer layer consisting of scalar bars indicates the number of genetic variants documented for each protein ranging from 0 to 10 variants, with variant counts > 10 denoted by dark red bars.

References

    1. Greenblatt J.F., Alberts B.M., and Krogan N.J., Discovery and significance of protein-protein interactions in health and disease. Cell, 2024. 187(23): p. 6501–6517. doi: 10.1016/j.cell.2024.10.038 - DOI - PMC - PubMed
    1. Del Toro N., et al. , The IntAct database: efficient access to fine-grained molecular interaction data. Nucleic Acids Res, 2022. 50(D1): p. D648–D653. doi: 10.1093/nar/gkab1006 - DOI - PMC - PubMed
    1. Giurgiu M., et al. , CORUM: the comprehensive resource of mammalian protein complexes-2019. Nucleic Acids Res, 2019. 47(D1): p. D559–D563. doi: 10.1093/nar/gky973 - DOI - PMC - PubMed
    1. Luck K., et al. , A reference map of the human binary protein interactome. Nature, 2020. 580(7803): p. 402–408. doi: 10.1038/s41586-020-2188-x - DOI - PMC - PubMed
    1. Drew K., Wallingford J.B., and Marcotte E.M., hu.MAP 2.0: integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies. Mol Syst Biol, 2021. 17(5): p. e10016. doi: 10.15252/msb.202010016 - DOI - PMC - PubMed

Publication types

LinkOut - more resources