Protein interactions in human pathogens revealed through deep learning

Ian R Humphreys^#^{1

2}, Jing Zhang^#^{3

4

5}, Minkyung Baek^#⁶, Yaxi Wang^#⁷, Aditya Krishnakumar^{1

2}, Jimin Pei^{3

4

5}, Ivan Anishchenko^{1

2}, Catherine A Tower⁷, Blake A Jackson⁷, Thulasi Warrier^{8

9

10}, Deborah T Hung^{8

9

10}, S Brook Peterson⁷, Joseph D Mougous^{7

11

12}, Qian Cong^{13

14

15}, David Baker^{16

17

18}

Affiliations

¹ Department of Biochemistry, University of Washington, Seattle, WA, USA.
² Institute for Protein Design, University of Washington, Seattle, WA, USA.
³ Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA.
⁴ Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA.
⁵ Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
⁶ Department of Biological Sciences, Seoul National University, Seoul, South Korea. minkbaek@snu.ac.kr.
⁷ Department of Microbiology, University of Washington, Seattle, WA, USA.
⁸ Department of Molecular Biology and Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA.
⁹ Department of Genetics, Harvard Medical School, Boston, MA, USA.
¹⁰ Broad Institute of MIT and Harvard, Cambridge, MA, USA.
¹¹ Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
¹² Microbial Interactions and Microbiome Center, University of Washington, Seattle, WA, USA.
¹³ Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA. qian.cong@utsouthwestern.edu.
¹⁴ Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA. qian.cong@utsouthwestern.edu.
¹⁵ Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA. qian.cong@utsouthwestern.edu.
¹⁶ Department of Biochemistry, University of Washington, Seattle, WA, USA. dabaker@uw.edu.
¹⁷ Institute for Protein Design, University of Washington, Seattle, WA, USA. dabaker@uw.edu.
¹⁸ Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA. dabaker@uw.edu.

^# Contributed equally.

PMID: 39294458
PMCID: PMC11445079
DOI: 10.1038/s41564-024-01791-x

Protein interactions in human pathogens revealed through deep learning

Ian R Humphreys et al. Nat Microbiol. 2024 Oct.

. 2024 Oct;9(10):2642-2652.

doi: 10.1038/s41564-024-01791-x. Epub 2024 Sep 18.

Authors

Affiliations

¹ Department of Biochemistry, University of Washington, Seattle, WA, USA.
² Institute for Protein Design, University of Washington, Seattle, WA, USA.
³ Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA.
⁴ Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA.
⁵ Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
⁶ Department of Biological Sciences, Seoul National University, Seoul, South Korea. minkbaek@snu.ac.kr.
⁷ Department of Microbiology, University of Washington, Seattle, WA, USA.
⁸ Department of Molecular Biology and Center for Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA, USA.
⁹ Department of Genetics, Harvard Medical School, Boston, MA, USA.
¹⁰ Broad Institute of MIT and Harvard, Cambridge, MA, USA.
¹¹ Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
¹² Microbial Interactions and Microbiome Center, University of Washington, Seattle, WA, USA.
¹³ Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA. qian.cong@utsouthwestern.edu.
¹⁴ Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA. qian.cong@utsouthwestern.edu.
¹⁵ Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA. qian.cong@utsouthwestern.edu.
¹⁶ Department of Biochemistry, University of Washington, Seattle, WA, USA. dabaker@uw.edu.
¹⁷ Institute for Protein Design, University of Washington, Seattle, WA, USA. dabaker@uw.edu.
¹⁸ Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA. dabaker@uw.edu.

^# Contributed equally.

PMID: 39294458
PMCID: PMC11445079
DOI: 10.1038/s41564-024-01791-x

Abstract

Identification of bacterial protein-protein interactions and predicting the structures of these complexes could aid in the understanding of pathogenicity mechanisms and developing treatments for infectious diseases. Here we developed RoseTTAFold2-Lite, a rapid deep learning model that leverages residue-residue coevolution and protein structure prediction to systematically identify and structurally characterize protein-protein interactions at the proteome-wide scale. Using this pipeline, we searched through 78 million pairs of proteins across 19 human bacterial pathogens and identified 1,923 confidently predicted complexes involving essential genes and 256 involving virulence factors. Many of these complexes were not previously known; we experimentally tested 12 such predictions, and half of them were validated. The predicted interactions span core metabolic and virulence pathways ranging from post-transcriptional modification to acid neutralization to outer-membrane machinery and should contribute to our understanding of the biology of these important pathogens and the design of drugs to combat them.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. PPI identification by coevolution and deep learning methods.**
a, Overview of the RF2-Lite network architecture. FAPE, frame-aligned point error. b, Benchmark performance of PPI prediction methods. Top: precision and recall curves of DCA (grey), RF 2-track (black), RF2-Lite (blue), AF (green) and AF-multimer (purple) in distinguishing true PPIs from random protein pairs. For different methods, we used the pMSAs generated by our bioinformatic pipeline (Supplementary Methods). We applied each method on a benchmark set of 1,000 randomly selected positive control pairs and 10,000 negative control pairs (Supplementary Methods). The precision and recall curve for this benchmark is in Supplementary Fig. 6a. Real signal-to-noise ratio for the PPI screen is on the order of 1:1,000¹; to reflect the impact of a much larger set of non-interacting pairs, we upsampled the negative control set to 1,000,000 by randomly sampling 100 ‘pseudo’ interacting probabilities from the Gaussian distribution around each real interacting probability we obtained for the negative controls with a standard deviation of 0.1. Bottom: runtime comparison of different PPI identification methods. c, Schematic overview of our PPI screen pipeline. d, Precision and recall curves at different stages in the pipeline. Top: DCA on PPI prediction; solid black vertical line represents the recall cut-off in this stage. Middle: RF2-Lite screen procedure on the ‘pilot set’; solid black vertical line indicates the recall cut-off at this stage. Bottom: AF screen procedure on the ‘pilot set’; dashed horizontal line shows the precision cut-off, that is, 0.95. e, Summary of predicted PPIs for the ‘pilot set’ that focuses on essential genes and virulence factors. Left: interactions between interacting essential genes in the ‘pilot set’ based on different evidence: blue, green and orange circles represent our predicted pairs, functional interactions according to STRING (total score ≥900 and experimental score ≥400) and interacting pairs according to PDB (BLAST hit to complex in PDB e ≤ 0.00001, sequence identity ≥50% and coverage ≥50%), respectively. Right: PPIs involving virulence factors in the ‘pilot set’ supported by difference evidence: red, purple and yellow circles represent our predictions, pairs according to STRING and pairs according to PDB.

**Fig. 2. Experimental validation of selected PPIs.**
a, Interactions assessed by B2H that measures β-galactosidase activity resulting from activation of the *lacZ* reporter gene due to the interaction between two tested proteins that are fused to two domains of a transcription activator. *E. coli* expressing T25-zip and T18-zip fusion proteins was used as a positive control (+ control), and *E. coli* harbouring empty T25 and T18 plasmids was used as a negative control (− control). m/m, mix-and-match control. RU, relative unit (luminescence per optical density at 600 nm per h). Error bars indicate ±s.d. (n = 2 biological replicates each with 2 technical replicates). Computed models of experimentally validated PPIs (‘lpg2881 + lpg0371’ and ‘RsfS + YbeZ’) are shown on the right—top: iron-sulfur cluster binding protein lpg2881 (Q5ZRK0) and uncharacterized protein lpg0371 (Q5ZYK1) from *L. pneumophila*; bottom: ribosomal silencing factor RsfS (Q9HX22) and PhoH-like protein domain-containing protein YbeZ (Q9HX38) from *P. aeruginosa*. b–e, Interactions validated by Co-IP/pull-down. Predicted interacting partners in each PPI pair are heterologously expressed and tagged (–H, hexahistidine; –V, VSV-G epitope). A random bait protein was included as a negative control for each experiment. Control lanes correspond to samples with prey proteins and beads added without any bait proteins. Each positive interaction is supported by two independent Co-IP/pull-down experiments. b, Ubiquinone biosynthesis C-methyltransferase UbiE (P0A887) and protein of unknown function YcaR (P0AAZ7) from *E. coli*. c, Uncharacterized protein PA4106 (Q9HWS2) and a putative transcriptional factor PA4105 (Q9HWS3) from *P. aeruginosa*. d, lpg2881 and lpg0371 from *L. pneumophila*, a pair that is tested positive by B2H as well. e, Putative imidazole glycerol phosphate synthase subunit hisF2 (P72139) and lipopolysaccharide biosynthesis protein WbpG (Q9HZ78) from *P. aeruginosa*. In all the panels, connecting green bars are between representative residue–residue contacts at the interfaces predicted from the summed AF probability for distance bins below 12 Å. Ni-NTA, nickel-nitrilotriacetic acid; VSV-G, vesicular stomatitis virus glycoprotein epitope.

**Fig. 3. Computed models of binary protein complexes.**
a–j, Interactions involving essential genes. a, Interaction with an enzyme where the enzymatic site is highlighted in light green with an NAD moiety. b–d, Additional interactions involving essential genes. e,j Interactions involving transport pathways. f–i, Transcription and translation. k–t, Interactions involving virulence factors. u–y, Interactions with uncharacterized proteins. In all models, the first protein is in blue, and the second is in gold. Green bars are between representative residue–residue contacts at the interfaces predicted from the summed AF probability for distance bins below 12 Å. Additional information (organisms and UniProt annotations) is in Supplementary Table 9.

**Fig. 4. Computed models for multi-component protein complexes.**
a, *H. pylori* tRNA 2-thiouridine synthesizing protein complex. Left: a model of the TusE(blue)–TusB(gold)–TusC(green)–TusD(pink) complex overlaid with the TusBCD PDB structure (2D1P, shown in semi-transparent grey). Right: an alternative view of this complex. b, The UreAB–UreFGH complex (coloured in cyan, pink, blue, gold and green, respectively) in *H. pylori* assembled through multiple subcomplexes: UreFGH, UreAB and UreAH. c, Accessory components of the Sec translocon. Top: *P. aeruginosa* SecG(blue)–SecY(gold)–PpiD(green) complex. Bottom: *M. tuberculosis* SecY(blue)–SecG(gold)–SecE(green)–CrgA(pink) complex. d, Accessory components of the *P. aeruginosa and S*. *typhimurium* outer-membrane β-barrel assembly machinery. Left: interaction between SurA (yellow) and Bam proteins (BamA, blue; BamB, gold; BamE, green). Middle: BamA (blue) and PA1005 (gold), a putative BepA orthologue. Right: interaction between TolC (blue) and BamD (gold). In all schematics, green, red, yellow and magenta bars connect representative residue–residue contacts at the interfaces predicted from the summed AF probability for distance bins below 12 Å.

See this image and copyright information in PMC

Update of

Essential and virulence-related protein interactions of pathogens revealed through deep learning.
Humphreys IR, Zhang J, Baek M, Wang Y, Krishnakumar A, Pei J, Anishchenko I, Tower CA, Jackson BA, Warrier T, Hung DT, Peterson SB, Mougous JD, Cong Q, Baker D. Humphreys IR, et al. bioRxiv [Preprint]. 2024 Apr 12:2024.04.12.589144. doi: 10.1101/2024.04.12.589144. bioRxiv. 2024. Update in: Nat Microbiol. 2024 Oct;9(10):2642-2652. doi: 10.1038/s41564-024-01791-x. PMID: 38645026 Free PMC article. Updated. Preprint.

References

1. Rajagopala, S. V. et al. The binary protein–protein interaction landscape of Escherichia coli. Nat. Biotechnol.32, 285–290 (2014). - DOI - PMC - PubMed
1. Uetz, P. et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature403, 623–627 (2000). - DOI - PubMed
1. Butland, G. et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature433, 531–537 (2005). - DOI - PubMed
1. Edwards, A. M. et al. Bridging structural biology and genomics: assessing protein interaction data with known complexes. Trends Genet.18, 529–536 (2002). - DOI - PubMed
1. Mackay, J. P., Sunde, M., Lowry, J. A., Crossley, M. & Matthews, J. M. Protein interactions: is seeing believing? Trends Biochem. Sci.32, 530–531 (2007). - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Protein interactions in human pathogens revealed through deep learning

Affiliations

Protein interactions in human pathogens revealed through deep learning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources