Protein complexes in cells by AI-assisted structural proteomics
- PMID: 36815589
- PMCID: PMC10090944
- DOI: 10.15252/msb.202311544
Protein complexes in cells by AI-assisted structural proteomics
Abstract
Accurately modeling the structures of proteins and their complexes using artificial intelligence is revolutionizing molecular biology. Experimental data enable a candidate-based approach to systematically model novel protein assemblies. Here, we use a combination of in-cell crosslinking mass spectrometry and co-fractionation mass spectrometry (CoFrac-MS) to identify protein-protein interactions in the model Gram-positive bacterium Bacillus subtilis. We show that crosslinking interactions prior to cell lysis reveals protein interactions that are often lost upon cell lysis. We predict the structures of these protein interactions and others in the SubtiWiki database with AlphaFold-Multimer and, after controlling for the false-positive rate of the predictions, we propose novel structural models of 153 dimeric and 14 trimeric protein assemblies. Crosslinking MS data independently validates the AlphaFold predictions and scoring. We report and validate novel interactors of central cellular machineries that include the ribosome, RNA polymerase, and pyruvate dehydrogenase, assigning function to several uncharacterized proteins. Our approach uncovers protein-protein interactions inside intact cells, provides structural insight into their interaction interfaces, and is applicable to genetically intractable organisms, including pathogenic bacteria.
Keywords: AlphaFold-Multimer; crosslinking mass spectrometry; protein-protein interactions; pyruvate dehydrogenase; uncharacterized proteins.
© 2023 The Authors. Published under the terms of the CC BY 4.0 license.
Conflict of interest statement
The authors declare no competing interests.
Figures

PPIs identified by crosslinking MS at 2% PPI‐level FDR (interactions to seven abundant and highly crosslinked proteins and within the ribosome are removed for clarity). Previously uncharacterized proteins are shown in blue. Selected complexes are highlighted. Thin lines represent a single DSSO crosslink between two proteins, medium thickness 2–4 crosslinks and thick lines 5 or more crosslinks.
The accessible interaction space of YugI and YabR to the 30S ribosome calculated by DisVis (van Zundert & Bonvin, 2015). The volumes represent the YugI and YabR center of mass positions consistent with 10 of 14 detected crosslinks for YugI and 6 of 8 crosslinks for YabR, indicating the location of their binding sites on the 30S ribosome.
Sucrose gradient (10–40% w/v) of B subtilis lysate separating the 70S, 50S, and 30S ribosomes from smaller proteins and their complexes. Western blots show that His‐tagged YabR and YugI (both ~ 15 kDa) co‐migrate in the sucrose gradient with the 30S ribosome, the control, wild‐type B. subtilis 168, and does not.
Smoothed elution profiles from the CoFrac‐MS analysis of the RNA polymerase (RNAP) and the known binders GreA and NusA and the uncharacterized protein YkuJ. The mass spec intensity is normalized maximum across fractions, and averaged across replicates and subunits. Top: untreated cells; Bottom: crosslinked cells. One standard deviation from the mean per fraction is shaded (N = 3 biological replicates). The interaction with YkuJ is stabilized upon crosslinking prior to fractionation.
PPIs detected by crosslinking MS analyzed by CoFrac‐MS. Co‐fractionation measured by PCprophet co‐elution scores in the crosslinked (y‐axis) and untreated (x‐axis) condition. PPIs within the dashed lines were considered equally predicted in both conditions (data in Dataset EV4).
Annotation of co‐fractionation behavior of uncharacterized proteins and their binding partners for protein pairs identified by crosslinking MS. Ribosomal proteins and proteins with missing CoFrac‐MS data in either the crosslinked or the untreated CoFrac‐MS condition were removed.

The 1,977 predicted
PPIs forAlphaFold‐Multimer interface prediction from crosslinkingMS ,CoFrac‐MS, andSubtiWiki .Breakdown of
AlphaFold ipTM score distributions byPPI origin. Annotation of score distributions forPPIs annotated by being present in thePDB (seq. identity > 30% and E‐value < 10−3) or by their presence inSTRING (combined score > 0.4). “Novel interaction” refers to a previously unknownPPI , while “novel interface” refers to the lack of homologous structures for thePPI in thePDB . The central band represents the median, the edges of the box the 25th and 75th quantiles, and the whiskers are 1.5 times the interquantile rage.Noise model evaluation of
ipTM distribution ofAlphaFold PPIs . Subsamples of 300PPIs from our datasets (target distribution) are compared to 300PPIs made up of random B. subtilis proteins from thePPI candidate list combined with random proteins from the E. coli or B. subtilis genome (noise distributions, hollow and filled red bars). While targets show a bimodal distribution, indicating the high confidence of models withipTM > 0.85, the noise distribution is one‐tailed, approximating the likelihood of random interface prediction in the variousipTM ranges. The histogram shows the median. For the experimentalPPIs, the error bars the standard deviation of 10 subsamples of the 1,977 predicted models.The 1,977 protein–protein interactions (
PPIs ) modeled byAlphaFold‐Multimer distribute over the fullpTM andipTM range, with a subpopulation of highly confident predictions withipTM > 0.85. Insets showing high‐ranking models colored by dataset of origin, and the top‐rankingPPIs not previously annotated inSubtiWiki. A novel
PPI from the co‐elution dataset showing the alaninetRNA synthetase subunitAlaS interacting with the uncharacterized proteinYozC . The highipTM value is reflected in the predicted aligned error plot, which also shows that the C‐terminal region ofAlaS , not involved in the interaction, is flexible with respect to theYozC‐AlaS module.Bacterial two‐hybrid assay to validate the interaction between
YozC andAlaS . N‐ or C‐terminal fusions ofYozC andAlaS to theT18 andT25 domains of the adenylate cyclaseCyaA were created and tested for interaction in the E. coli strainBTH101 . Colonies turn dark as a result of protein interaction, which leads to the restoration of the adenylate cyclase activity and therefore expression of the ß‐galactosidase. A leucine zipper domain was used as a positive control.

Percentage of heteromeric crosslink restraint violation per range of ipTM. The central band represents the median, the edges of the box the 25th and 75th quantiles, and the whiskers are 1.5 times the interquantile rage.
Bubble plot showing numbers of heteromeric crosslinks violated for each PPI identified by crosslinking MS against the ipTM and pTM distribution.
Successful predictions consistent with crosslinking MS, including predictions of paralogs (YtoP‐YsdC and RocA‐PutC). Self‐crosslinks in gray and heteromeric crosslinks in orange.
Crosslinks highlighting flexibility within the OpuAA‐OpuAB dimer. The OpuAA N‐terminal domain is predicted with a high pAE to the C‐terminal region. The crosslinks corresponding to these interdomain distances are also violated, indicating flexibility between these two domains. Left: Self crosslinks in gray and heteromeric crosslinks in orange; center: satisfied crosslinks (< 30 Å Cα–Cα) in blue and violated crosslinks in red; right: predicted aligned error plot.

All dimeric
PPIs with predictedipTM > 0.65, which form connected groups of only three proteins are shown.The 33 candidate 1:1:1 trimer
PPIs modeled byAlphaFold‐Multimer (version 2.2.1) distribute over the fullpTM andipTM range (inset). Trimers with anipTM > 0.80 are labeled.Selected predicted structures of trimeric complexes with
ipTM > 0.80 and their associatedPAE plots. Crosslinks are visualized onLutA‐LutB‐LutC ; and satisfied crosslinks (< 30 ÅCα–Cα ) in blue and violated crosslinks in red.

Homology model of B. subtilis
E1 pyruvate dehydrogenase (PDH ) based on the Geobacillus stearothermophilus E1p structure (PDB id 3dv0; Pei et al, 2008) in surface representation. The space‐fill model of pyruvate is located in the active site based on the template structure. TheE1 PDH is a dimer of dimers of thePdhA andPdhB subunits, with the active site formed at the interface between aPdhA and aPdhB copy.Mapping of crosslinks onto the
E1 PDH model derived from combiningAlphaFold‐Multimer models. Satisfied crosslinks (< 30 ÅCα–Cα ) in blue and violated crosslinks in red.AlphaFold‐Multimer predictions forPdhA‐PdhB‐PdhI /YneR . The top‐ranked solution byipTM (0.89) describes thePdhA‐PdhB subcomplex that does not make up the active site, while the nineth‐ranked solution (0.81) identifies the active site interface. Crosslinking data clarify the interactions betweenPdhI/YneR andPdhA /B. Pyruvate and3‐deaza‐TdHP are shown as space‐fill models. Crosslink coloring as in B.Circle view of crosslinking
MS data mapped onto theE1 PDH‐PdhI /YneR model derived by combiningAlphaFold solutions onto the known stoichiometry. Satisfied crosslinks (< 30 ÅCα–Cα ) in blue and violated crosslinks in red.PDH‐PdhI/YneR model constructed fromAlphaFold‐Multimer models of thePdhA‐PdhB‐PdhI/YneR trimer.PdhI /YneR binds at the pocket opening onto the active site.Visualization of the active site in the
AlphaFold‐Multimer model (solid cartoon) with ligand positions derivedPDB id 3dv0 (transparent cartoon and sticks).PdhI /YneR occludes the entrance to the active site by insertingY31 into the pocket used for entrance of the lipoate co‐factor that comes to reduce the thiamine ring in theenamine‐ThDP intermediate. The original structure was solved in the presence of theenamine‐ThDP analog3‐deaza‐TdHP (Pei et al, 2008). Key residues for ligand coordination are predicted in the same conformation byAlphaFold‐Multimer .CoFrac‐MS data showing co‐elution ofPdhA ,PdhB, andPdhI/YneR . The shaded area corresponds to the standard deviation between replicas.Growth curves on glucose and pyruvate. Growth experiment of wild‐type (blue) B. subtilis,
PdhI /YneR overexpression (green), andPdhI /YneR knockoutΔyneR (red) inMSSM minimal medium with 5mM KCl comparing growth on either glucose or pyruvate as a sole carbon source. Empty vector control in orange. Mutations in residues involved inPdhI binding toPdhA /PdhB lead to phenotypic recovery. Lines represent the mean. The shaded area corresponds to 95% confidence intervals (N = 3 biological replicas).
Comment in
-
Cracking the code of cellular protein-protein interactions: Alphafold and whole-cell crosslinking to the rescue.Mol Syst Biol. 2023 Apr 12;19(4):e11587. doi: 10.15252/msb.202311587. Epub 2023 Mar 10. Mol Syst Biol. 2023. PMID: 36896624 Free PMC article.
References
-
- Ashiuchi M, Nawa C, Kamei T, Song JJ, Hong SP, Sung MH, Soda K, Misono H (2001) Physiological and biochemical characteristics of poly gamma‐glutamate synthetase complex of Bacillus subtilis . Eur J Biochem 268: 5321–5328 - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases