Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 26;21(6):e1013168.
doi: 10.1371/journal.pcbi.1013168. eCollection 2025 Jun.

AI-first structural identification of pathogenic protein target interfaces

Affiliations

AI-first structural identification of pathogenic protein target interfaces

Mihkel Saluri et al. PLoS Comput Biol. .

Abstract

The risk of pandemics is increasing as global population growth and interconnectedness accelerate. Understanding the structural basis of protein-protein interactions between pathogens and hosts is critical for elucidating pathogenic mechanisms and guiding treatment or vaccine development. Despite 21,064 experimentally supported human-pathogen interactions in the HPIDB, only 52 have resolved structures in the PDB, representing just 0.2%. Advances in protein complex structure prediction, such as AlphaFold, now enable highly accurate modelling of heterodimeric complexes, though their application to host-pathogen interactions, which have distinct evolutionary dynamics, remains underexplored. Here, we investigate the structural protein-protein interaction network between humans and ten pathogens, predicting structures for 9,452 interactions, only 10 of which have known structures. We identify 30 interactions with an expected TM-score ≥0.9, tripling the structural coverage in these networks. A detailed analysis of the Francisella tularensis dihydroprolyl dehydrogenase (IPD) complex with human immunoglobulin kappa constant (IGKC) using homology modelling and native mass spectrometry confirms a predicted 1:2:1 heterotetramer, suggesting potential roles in immune evasion. These findings highlight the transformative potential of structure prediction for rapidly advancing vaccine and drug development against novel pathogenic targets.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. TM-score comparison between FoldDock and AFM.
a) The median TM-score is 0.67 for AFM, 0.64 for FoldDock and 0.68 for FoldDock+templates, using all 111 nonredundant host-pathogen interactions in the PDB. b) Selecting only the host-pathogen interactions that are not present in the AFM training set (n = 24), reduced the median TM-score from AFM to 0.63, to 0.67 for FoldDock+templates and increased it for FoldDock to 0.65. c) Comparison of pDockQ and the TM-score using FoldDock (n = 111). The points represent each model, the solid blue line the running average using a step size of 0.1 in pDockQ and the dashed grey line a cutoff of 0.3 in pDockQ. When the pDockQ score is high, so is the TM-score.
Fig 2
Fig 2. Structure prediction for the 10 selected pathogens.
a) Number of interactions with human proteins for each pathogen in the HPIDB. The numbers in parentheses are the number of interactions that could be predicted (some proteins are too large to be predicted in complex on GPUs with 40Gb RAM, limit of approximately 3000 residues). b) Distribution of pDockQ scores for the 8441 HP-PPIs that were successfully predicted. c) Distribution of the average plDDT for each chain in the complexes from the HPIDB (n = 8441) and the PDB (n = 111). The PDB set has much higher plDDT on average. There are 93 complexes out of 111 where both chains have over 70 plDDT on average for the PDB set (84%) and 3472 out of 7948 for the HPIDB set (44%). d) The number of HP-PPIs with known complex structure (n = 10) and with high-quality predictions (above 70 plDDT and pDockQ 0.3, n = 30) for each pathogen. Five of these are from HPV, one from EBV, three from F. tularensis, six from B. anthracis and 15 from Y. pestis.
Fig 3
Fig 3. Analysis of the selected high-quality HP-PPI for HPV (a), EBV (b), F. tularensis (c & d), B. anthracis (e & f) and Y. pestis (g,h &i).
The human proteins are shown in green and the pathogenic ones in cyan. Potential native structures are shown in grey superposed with the predictions. a) Predicted structure of the interaction between UBA1 (green) and E2 (cyan. UBA1 is in structural superposition with the native structure (grey, TM-score = 0.98) from HPV. The native structure of UBA1 is complex with ubiquitin (orange, PDB ID 6DC6, https://www.rcsb.org/structure/6dc6). E2 captures ubiquitin in its activation area and thereby likely prevents its release. b) Predicted structure of the interaction between TTC12 (green) and BBRF2 (cyan) from EBV. BBRF2 is in structural superposition with the native structure (grey, https://www.rcsb.org/3d-view/6LQN/1). c) IGKC (https://www.uniprot.org/uniprot/P01834) interacts with IPD (https://www.uniprot.org/uniprot/Q5NEX4), potentially hindering antibody formation. d) LNPEP (https://www.uniprot.org/uniprot/Q9UIQ6) interacts with argS (https://www.uniprot.org/uniprot/Q5NHI8). This interaction may inhibit the activity of LNPEP sterically, although the substrate binding site (magenta) is not hindered. e) ANXA2 (Annexin A2, https://www.uniprot.org/uniprot/P07355) interacting with GBAA_3695 (Unknown protein, https://www.uniprot.org/uniprot/A0A0F7RE19). This interaction may inhibit the production of reactive oxygen species. f) CTSB (Cathepsin B, https://www.uniprot.org/uniprot/P07858) interacts with GBAA_0078 (UVR domain-containing protein, https://www.uniprot.org/uniprot/A0A6L8P7D1). The structure of Chagasin (https://www.rcsb.org/structure/3CBJ) is shown as it is interacting with CTSB. g) PDCD6 (Programmed cell death protein 6, https://www.uniprot.org/uniprot/O75340) interacts with slt (Peptidoglycan lytic exotransglycosylase, https://www.uniprot.org/uniprot/Q8CZP1). HEBP2 is also shown bound to PDCD6 (https://www.rcsb.org/3d-view/5GQQ), which is thought to promote the inhibition of HIV production. h) B2M (Beta-2-microglobulin, https://www.uniprot.org/uniprot/P61769, superposition with the structure of MHC-I: https://www.rcsb.org/structure/1A1M) interacts with YPMT1.34 (Uncharacterized, https://www.uniprot.org/uniprot/O68752) i) CSNK2B (Casein kinase II subunit beta, https://www.uniprot.org/uniprot/P67870) and nifj (Putative pyruvate-flavodoxin oxidoreductase, https://www.uniprot.org/uniprot/A0A3N4BEU0) interact in Fig H in S1 Appendix.
Fig 4
Fig 4. Mass spectrometric validation of the IPD-IGKC interaction.
a) Overlay of the predicted structure of F. tularensis IPD (cyan) with the crystal structure of the IPD dimer from N. meningitidis (red/orange). The structures are nearly identical except for a β-hairpin (arrow). b) Two of the predicted IPD-IGKC heterodimers were combined into a heterotetramer by aligning each IPD protein in the heterodimer with a subunit of the native IPD dimer structure from Fig 4a (PDB ID 1OJT). Accordingly, each IPD protomer can bind one IGKC monomer. c) Native Mass Spectrometry analysis of Francisella IPD (top) shows exclusively dimeric protein. Minor peaks indicated by asterisks correspond to homo-and heterodimers involving a truncated variant (103.1 kDa). Human IGKC (middle) exists predominantly as a monomer with a smaller dimeric population. Incubation of equimolar amounts of IPD and IGKC (bottom) results in the formation of a 2:2 IPD-IGKC complex, a minor population of 2:1 IPD-IGKC complex, and a shift in the oligomeric state for IGKC towards the dimer.
Fig 5
Fig 5. Data and modelling workflow.
To extract host-pathogen protein complexes from the PDB we selected all heteromers with a resolution below 5Å. We then mapped the proteins to PFAM and selected unique combinations of interacting domains to exclude redundancy in the interactions. The structure of the resulting 111 host-pathogen interactions were then predicted using the full-length sequences.
Fig 6
Fig 6. Host (species) and pathogen (superkingdoms) distributions for the 111 PPIs.
Eukarya are shown in blue, Bacteria in green, Viruses in magenta and Archaea in orange. The majority of hosts are Eukarya, with humans being the most represented species. Most pathogens come from Bacteria, followed by Viruses. The host distribution is severely biased towards the most studied eukaryotic (Homo sapiens, Mus musculus and S. cerevisiae) and Bacterial species (E. coli).

Similar articles

Cited by

References

    1. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9. doi: 10.1038/s41586-021-03819-2 - DOI - PMC - PubMed
    1. Evans R, O’Neill M, Pritzel A, Antropova N, Senior A, Green T, et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv. 2021. doi: 10.1101/2021.10.04.463034 - DOI
    1. Roney JP, Ovchinnikov S. State-of-the-art estimation of protein model accuracy using AlphaFold. bioRxiv. 2022. p. 2022.03.11.484043. doi: 10.1101/2022.03.11.484043 - DOI - PMC - PubMed
    1. Bryant P, Pozzati G, Elofsson A. Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun. 2022;13(1):1265. doi: 10.1038/s41467-022-28865-w - DOI - PMC - PubMed
    1. Sironi M, Cagliani R, Forni D, Clerici M. Evolutionary insights into host-pathogen interactions from mammalian sequence data. Nat Rev Genet. 2015;16(4):224–36. doi: 10.1038/nrg3905 - DOI - PMC - PubMed

LinkOut - more resources