. 2022 Dec 28:11:e82885.

doi: 10.7554/eLife.82885.

Deep learning-driven insights into super protein complexes for outer membrane protein biogenesis in bacteria

Mu Gao¹, Davi Nakajima An², Jeffrey Skolnick¹

Affiliations

¹ Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, United States.
² School of Computer Science, Georgia Institute of Technology, Atlanta, United States.

PMID: 36576775
PMCID: PMC9797188
DOI: 10.7554/eLife.82885

Deep learning-driven insights into super protein complexes for outer membrane protein biogenesis in bacteria

Mu Gao et al. Elife. 2022.

. 2022 Dec 28:11:e82885.

doi: 10.7554/eLife.82885.

Authors

Mu Gao¹, Davi Nakajima An², Jeffrey Skolnick¹

Affiliations

¹ Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, United States.
² School of Computer Science, Georgia Institute of Technology, Atlanta, United States.

PMID: 36576775
PMCID: PMC9797188
DOI: 10.7554/eLife.82885

Abstract

To reach their final destinations, outer membrane proteins (OMPs) of gram-negative bacteria undertake an eventful journey beginning in the cytosol. Multiple molecular machines, chaperones, proteases, and other enzymes facilitate the translocation and assembly of OMPs. These helpers usually associate, often transiently, forming large protein assemblies. They are not well understood due to experimental challenges in capturing and characterizing protein-protein interactions (PPIs), especially transient ones. Using AF2Complex, we introduce a high-throughput, deep learning pipeline to identify PPIs within the Escherichia coli cell envelope and apply it to several proteins from an OMP biogenesis pathway. Among the top confident hits obtained from screening ~1500 envelope proteins, we find not only expected interactions but also unexpected ones with profound implications. Subsequently, we predict atomic structures for these protein complexes. These structures, typically of high confidence, explain experimental observations and lead to mechanistic hypotheses for how a chaperone assists a nascent, precursor OMP emerging from a translocon, how another chaperone prevents it from aggregating and docks to a β-barrel assembly port, and how a protease performs quality control. This work presents a general strategy for investigating biological pathways by using structural insights gained from deep learning-based predictions.

Keywords: E. coli; computational biology; deep learning; molecular biophysics; outer membrane protein biogenesis; protein complex structure prediction; protein-protein interaction; structural biology; systems biology; translocon; virtual screening.

Plain language summary

All living cells are contained within a fatty cell membrane that allows water and only certain other molecules to pass through with ease. Bacteria only consist of a single cell, making their membrane the only interface with the surrounding environment. Gram-negative bacteria – which include Escherichia coli, a bacterium found in the gut of all humans – have an extra layer of protection, the ‘outer membrane’. Proteins in this membrane are called ‘outer membrane proteins’ or OMPs and allow nutrients to enter the cell. But OMPs, which are made inside the cell, need to be transported to the outer membrane and folded correctly before they can perform their role. This multistep process, which involves interactions between many different proteins, is not fully understood. The journey of an OMP from the center of the cell where it is made to the outer membrane is complicated. First, the OMP needs to pass through the cell’s inner membrane. To do this, it must interact with ‘channel proteins’ in the inner membrane that feed the OMP into the space between the two membranes, known as the bacterial envelope. This step requires the OMP to be unfolded. Once in the bacterial envelope the OMP interacts with proteins that help it fold correctly and integrate into the outer membrane. The interactions between proteins in the bacterial envelope are short-lived, making them difficult to study using lab-based experiments. An alternative approach is predicting a protein’s structure from its amino acid sequence which is a difficult computational problem to solve. However, in 2020 developers behind the AlphaFold2, a deep learning program, were able to use a set of equations organized in a ‘neural network’ that can ‘learn’ from a library of known protein structures to predict unknown structures with high accuracy. Gao et al. used AF2Complex, a tool based AlphaFold2, tailored to predicting interactions between proteins, to investigate what interactions OMPs could be involved with on their way to the outer membrane. With the help of a supercomputer at the Oakridge National Laboratory, Gao et al. screened nearly 1,500 E. coli proteins within the bacterial envelope to see how they might interact with OMPs. The screen identified previously unknown interactions between proteins that suggest that the formation of the bacterial outer membrane and the integration of proteins into it involve protein complexes and molecular mechanisms that have not yet been characterized. Additionally, the screen also identified interactions that had been previously described, confirming that the deep learning approach can correctly capture real interactions. Overall, Gao et al.’s work inspires new hypotheses about the mechanisms through which OMPs are transported to the outer membrane, although further work will be needed to confirm the roles of protein interactions predicted by the computational model experimentally. Furthermore, the ability to design experiments based on computational predictions is exciting. If confirmed, the new protein interactions could help scientists better understand OMP transport, which is essential for bacterial biology. In the future, this could lead to the discovery of new targets for antibiotic drugs.

PubMed Disclaimer

Conflict of interest statement

MG, DN, JS No competing interests declared

Figures

**Figure 1.. *E. coli* super-translocon SecYEG/PpiD/YfgM.**
(a) Computational screening for protein-protein interaction partners of PpiD and YfgM within the *E. coli* envelopome, respectively. A histogram displays the distribution of the top interface scores (iScores) of all envelope proteins screened with each query. Black arrows mark the top hits that were further studied, along with their names and overall ranks. (b) The top AF2Complex model of a supercomplex made of PpiD (blue), YfgM (green), SecY (silver), SecE (cyan), and SecG (tan) in three different views. Proteins are shown in a cartoon representation. Viewpoint transition, from either left to right or top to bottom, is indicated by a rotation axis (dashed line) and the rotation angle in degrees (circled arrow). (**c, d, and e**) Predicted PPI sites. The corresponding locations in b are indicated by black boxes. For clarity, the viewpoints and representations are adjusted. In the surface representations c and e, the color code is hydrophobic (white), polar (green), positive (blue), and negative (red), except for Phe122_PpiD (yellow) in e. The same color code for the surface representation is employed below unless noted otherwise. PPI residues are shown in a ball-and-stick representation for PpiD in c, SecY in d, and SecG in e; the color scheme of atoms is carbon (cyan), oxygen (red), nitrogen (blue), and sulfur (yellow). The same scheme of atoms is adopted throughout this work.

**Figure 2.. Structural model of the SecYEG/PpiD/YfgM/DsbA supercomplex.**
(a) Two views of the predicted structure. DsbA is shown in red, while the other proteins are colored the same as in Figure 1. Two cysteines, Cys49 and Cys52, essential to the enzymatic function of DsbA, are shown as spheres. (b) Protein-protein interaction sites between PpiD and DsbA. For clarity, tertiary structures are transparent. Key interacting residues are shown in the licorice representation for PpiD and in the ball-and-stick representation for DsbA.

**Figure 3.. Predicted structure of the PpiD/YfgM/LepB/OmpA supercomplex.**
(a) Two views in the cartoon representation are shown. Colors: PpiD (blue), YfgM (green), LepB (magenta), and OmpA (residue 1–87, yellow). For clarity, representations of PpiD and LepB are transparent. (b) Close-up view of the OmpA signal peptide in the active site of LepB. Essential catalytic residues, Ser89, Ser91, Lys146, and Ser279 of LepB are shown in a licorice representation, and the cleavage site Ala21 and Ala19 of OmpA is shown as spheres.

**Figure 4.. Structural models of SurA in the absence and presence of an OmpA substrate.**
(a) Open and closed conformations of monomeric SurA, consisting of the core domain (N-terminal region in gray and C-terminal in tan), P1 (purple), P2 (red). (**b and c**) Two structures of SurA in the presence of an OmpA substrate from two separate modeling runs. In both, SurA is open as in a. The β-barrel domain of OmpA is completely unfolded and generally does not maintain the same residue-residue contacts with SurA, except that an OmpA aromatic residue consistently makes π-π interactions with Tyr128_SurA located in the crevice of the SurA core domain. Two β-signal residues, Y189 and F191 of OmpA, are also shown as spheres. The folded periplasmic domain of OmpA is bound to the P2 domain of SurA. (d) Envelopome protein-protein interaction screening of SurA identifies itself and BamA among the top hits. (e) Two views of the top predicted structure of a SurA dimer (green and cyan). (f) Superimposition of two open conformations from the monomeric and dimeric SurA. Subscripts indicate the stoichiometry. The color schemes correspond to those used in a and e. Only a single SurA from the dimeric model is shown in the superposition.

**Figure 4—figure supplement 1.. Structural models of the OmpA polypeptide in the absence of SurA.**
(a) Predicted model of OmpA obtained with a shallow multiple sequence alignments (MSAs) and no structural templates. The N-terminal β-barrel domain is collapsed but not in its native fold; the C-terminal periplasmic domain is native-like. Two cysteines, Cys311 and Cys323, forming a disulfide bond are shown as spheres. (b) The predicted model aligned to an nuclear magnetic resonance (NMR) structure of the periplasmic domain (PDB code: 2MQE, TM-score [Zhang and Skolnick, 2004] ~0.75).

**Figure 5.. Structural model of SurA docked to β-barrel assembly machine (BAM).**
(a and b) Two views of the top ranked supercomplex model in the cartoon representation. The BAM constituents are BamA (green), BamB (pink), BamC (yellow), BamD (blue), and BamE (black). The N-terminal POTRA1 domain of BamA provides the main interaction sites for SurA (violet). (c) Close-up view of the interaction sites at POTRA1 and the core domain of SurA. Interacting residues are shown in the licorice (SurA) and ball-and-stick (BamA) representations. (d) Crystal structure of SurA (magenta, PDB 1M5Y) superimposed onto the computed structure of the supercomplex. The magenta arrow indicates the change of location in P2 between two structures. BamCD are omitted for clarity. (e) Protein-protein interaction between P2 of SurA and POTRA4 of BamA and BamE.

**Figure 6.. Structural models of β-barrel assembly machine (BAM) and BepA.**
(a) Computational protein-protein interaction screening identifies BepA as a top hit to BamA. (b) Top structural model of the heterodimeric complex of BamA (green) and BepA (purple) in cartoon representation. The lid of BepA is colored red. The active sites with the protease domain of BepA are shown in a surface representation (orange). The five POTRA domains of BamA are labeled P1−P5. (c) Predicted structure of the BAM/BepA supercomplex. The lid of BepA extends to an open conformation. The image was created from the same viewpoint as b. (d) Specific residue-residue contacts between BamA and the tetratricopeptide repeat (TPR) and protease domains of BepA. (e) Close-up views of the lid and the BamA β-barrel in the surface (top) and cartoon representations (bottom). A hydrophobic contact between Ala180_BepA and Leu780_BamA is shown as spheres, and the lateral gate of BamA is between the β1 and β16 strands (dark blue).

**Figure 6—figure supplement 1.. Predicted structures of BepA compared to two experimental structures.**
(**a and b**) Superposition of structural models of the lid (red/magenta) in closed and open states, respectively, onto a crystal structure (PDB code: 6AIT, cyan). (c) Superposition onto another crystal structure (PDB code: 6ASR, cyan).

**Figure 7.. Proposed mechanisms involved in the outer membrane protein (OMP) biogenesis pathway in *E. coli*.**
Complex structures resulting from this study accompany relevant cartoon diagrams. Powered by SecA, a precursor OmpA polypeptide (orange line) first passes through the SecYEG translocon in complex with PpiD and YfgM. PpiD, held in place by YfgM, senses the translocating substrate via its N-terminal α-helix bound to the lateral gate of SecY and temporarily dissociates from the translocon upon receiving the substrate OmpA. Protein disulfide isomerase DsbA is recruited by PpiD and promotes formation of a disulfide bond between two cysteine residues (yellow spheres) of OmpA. Meanwhile, peptidase LepB fills the vacancy left by SecYEG and cleaves the transmembrane signal peptide from OmpA, which is then handed over to chaperone SurA. At this point, the periplasmic domain of OmpA is folded, but the unfolded β-barrel region wraps around SurA, which carries OmpA to BAM. Lastly, SurA docks to POTRA1, the N-terminal domain of BamA, where the β-barrel domain of OmpA is folded and released from the lateral gate of BamA. If this folding and assembly process are stalled for some reason, metalloprotease BepA senses the failure with its flexible lid and cleans up by cleaving a stuck substrate. For clarity, the peptidoglycan layer in the periplasm is not shown, and the schematic drawings are not to scale.

See this image and copyright information in PMC

References

1. Alvira S, Watkins DW, Troman L, Allen WJ, Lorriman JS, Degliesposti G, Cohen EJ, Beeby M, Daum B, Gold VA, Skehel JM, Collinson I. Inter-Membrane association of the Sec and Bam translocons for bacterial outer-membrane biogenesis. eLife. 2020;9:e60669. doi: 10.7554/eLife.60669. - DOI - PMC - PubMed
1. Antonoaea R, Fürst M, Nishiyama K-I, Müller M. The periplasmic chaperone PpiD interacts with secretory proteins exiting from the SecYEG translocon. Biochemistry. 2008;47:5649–5656. doi: 10.1021/bi800233w. - DOI - PubMed
1. Babu M, Bundalovic-Torma C, Calmettes C, Phanse S, Zhang Q, Jiang Y, Minic Z, Kim S, Mehla J, Gagarinova A, Rodionova I, Kumar A, Guo H, Kagan O, Pogoutse O, Aoki H, Deineko V, Caufield JH, Holtzapple E, Zhang Z, Vastermark A, Pandya Y, Lai CC-L, El Bakkouri M, Hooda Y, Shah M, Burnside D, Hooshyar M, Vlasblom J, Rajagopala SV, Golshani A, Wuchty S, F Greenblatt J, Saier M, Uetz P, F Moraes T, Parkinson J, Emili A. Global landscape of cell envelope protein complexes in Escherichia coli. Nature Biotechnology. 2018;36:103–112. doi: 10.1038/nbt.4024. - DOI - PMC - PubMed
1. Behrens S, Maier R, de Cock H, Schmid FX, Gross CA. The sura periplasmic ppiase lacking its parvulin domains functions in vivo and has chaperone activity. The EMBO Journal. 2001;20:285–294. doi: 10.1093/emboj/20.1.285. - DOI - PMC - PubMed
1. Behrens-Kneip S. The role of SurA factor in outer membrane protein transport and virulence. International Journal of Medical Microbiology. 2010;300:421–428. doi: 10.1016/j.ijmm.2010.04.012. - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions

Grants and funding

R35 GM118039/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep learning-driven insights into super protein complexes for outer membrane protein biogenesis in bacteria

Affiliations

Deep learning-driven insights into super protein complexes for outer membrane protein biogenesis in bacteria

Authors

Affiliations

Abstract

Plain language summary

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous