Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 12;19(4):e11544.
doi: 10.15252/msb.202311544. Epub 2023 Feb 23.

Protein complexes in cells by AI-assisted structural proteomics

Affiliations

Protein complexes in cells by AI-assisted structural proteomics

Francis J O'Reilly et al. Mol Syst Biol. .

Abstract

Accurately modeling the structures of proteins and their complexes using artificial intelligence is revolutionizing molecular biology. Experimental data enable a candidate-based approach to systematically model novel protein assemblies. Here, we use a combination of in-cell crosslinking mass spectrometry and co-fractionation mass spectrometry (CoFrac-MS) to identify protein-protein interactions in the model Gram-positive bacterium Bacillus subtilis. We show that crosslinking interactions prior to cell lysis reveals protein interactions that are often lost upon cell lysis. We predict the structures of these protein interactions and others in the SubtiWiki database with AlphaFold-Multimer and, after controlling for the false-positive rate of the predictions, we propose novel structural models of 153 dimeric and 14 trimeric protein assemblies. Crosslinking MS data independently validates the AlphaFold predictions and scoring. We report and validate novel interactors of central cellular machineries that include the ribosome, RNA polymerase, and pyruvate dehydrogenase, assigning function to several uncharacterized proteins. Our approach uncovers protein-protein interactions inside intact cells, provides structural insight into their interaction interfaces, and is applicable to genetically intractable organisms, including pathogenic bacteria.

Keywords: AlphaFold-Multimer; crosslinking mass spectrometry; protein-protein interactions; pyruvate dehydrogenase; uncharacterized proteins.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1. Whole‐cell crosslinking reveals protein–protein interactions (PPIs) otherwise lost upon cell lysis
  1. PPIs identified by crosslinking MS at 2% PPI‐level FDR (interactions to seven abundant and highly crosslinked proteins and within the ribosome are removed for clarity). Previously uncharacterized proteins are shown in blue. Selected complexes are highlighted. Thin lines represent a single DSSO crosslink between two proteins, medium thickness 2–4 crosslinks and thick lines 5 or more crosslinks.

  2. The accessible interaction space of YugI and YabR to the 30S ribosome calculated by DisVis (van Zundert & Bonvin, 2015). The volumes represent the YugI and YabR center of mass positions consistent with 10 of 14 detected crosslinks for YugI and 6 of 8 crosslinks for YabR, indicating the location of their binding sites on the 30S ribosome.

  3. Sucrose gradient (10–40% w/v) of B subtilis lysate separating the 70S, 50S, and 30S ribosomes from smaller proteins and their complexes. Western blots show that His‐tagged YabR and YugI (both ~ 15 kDa) co‐migrate in the sucrose gradient with the 30S ribosome, the control, wild‐type B. subtilis 168, and does not.

  4. Smoothed elution profiles from the CoFrac‐MS analysis of the RNA polymerase (RNAP) and the known binders GreA and NusA and the uncharacterized protein YkuJ. The mass spec intensity is normalized maximum across fractions, and averaged across replicates and subunits. Top: untreated cells; Bottom: crosslinked cells. One standard deviation from the mean per fraction is shaded (N = 3 biological replicates). The interaction with YkuJ is stabilized upon crosslinking prior to fractionation.

  5. PPIs detected by crosslinking MS analyzed by CoFrac‐MS. Co‐fractionation measured by PCprophet co‐elution scores in the crosslinked (y‐axis) and untreated (x‐axis) condition. PPIs within the dashed lines were considered equally predicted in both conditions (data in Dataset EV4).

  6. Annotation of co‐fractionation behavior of uncharacterized proteins and their binding partners for protein pairs identified by crosslinking MS. Ribosomal proteins and proteins with missing CoFrac‐MS data in either the crosslinked or the untreated CoFrac‐MS condition were removed.

Figure 2
Figure 2. Structure prediction of binary complexes with AlphaFold‐Multimer
  1. The 1,977 predicted PPIs for AlphaFold‐Multimer interface prediction from crosslinking MS, CoFrac‐MS, and SubtiWiki.

  2. Breakdown of AlphaFold ipTM score distributions by PPI origin. Annotation of score distributions for PPIs annotated by being present in the PDB (seq. identity > 30% and E‐value < 10−3) or by their presence in STRING (combined score > 0.4). “Novel interaction” refers to a previously unknown PPI, while “novel interface” refers to the lack of homologous structures for the PPI in the PDB. The central band represents the median, the edges of the box the 25th and 75th quantiles, and the whiskers are 1.5 times the interquantile rage.

  3. Noise model evaluation of ipTM distribution of AlphaFold PPIs. Subsamples of 300 PPIs from our datasets (target distribution) are compared to 300 PPIs made up of random B. subtilis proteins from the PPI candidate list combined with random proteins from the E. coli or B. subtilis genome (noise distributions, hollow and filled red bars). While targets show a bimodal distribution, indicating the high confidence of models with ipTM > 0.85, the noise distribution is one‐tailed, approximating the likelihood of random interface prediction in the various ipTM ranges. The histogram shows the median. For the experimental PPIs, the error bars the standard deviation of 10 subsamples of the 1,977 predicted models.

  4. The 1,977 protein–protein interactions (PPIs) modeled by AlphaFold‐Multimer distribute over the full pTM and ipTM range, with a subpopulation of highly confident predictions with ipTM > 0.85. Insets showing high‐ranking models colored by dataset of origin, and the top‐ranking PPIs not previously annotated in SubtiWiki.

  5. A novel PPI from the co‐elution dataset showing the alanine tRNA synthetase subunit AlaS interacting with the uncharacterized protein YozC. The high ipTM value is reflected in the predicted aligned error plot, which also shows that the C‐terminal region of AlaS, not involved in the interaction, is flexible with respect to the YozC‐AlaS module.

  6. Bacterial two‐hybrid assay to validate the interaction between YozC and AlaS. N‐ or C‐terminal fusions of YozC and AlaS to the T18 and T25 domains of the adenylate cyclase CyaA were created and tested for interaction in the E. coli strain BTH101. Colonies turn dark as a result of protein interaction, which leads to the restoration of the adenylate cyclase activity and therefore expression of the ß‐galactosidase. A leucine zipper domain was used as a positive control.

Figure 3
Figure 3. Crosslinking MS validation of AlphaFold‐Multimer models
  1. Percentage of heteromeric crosslink restraint violation per range of ipTM. The central band represents the median, the edges of the box the 25th and 75th quantiles, and the whiskers are 1.5 times the interquantile rage.

  2. Bubble plot showing numbers of heteromeric crosslinks violated for each PPI identified by crosslinking MS against the ipTM and pTM distribution.

  3. Successful predictions consistent with crosslinking MS, including predictions of paralogs (YtoP‐YsdC and RocA‐PutC). Self‐crosslinks in gray and heteromeric crosslinks in orange.

  4. Crosslinks highlighting flexibility within the OpuAA‐OpuAB dimer. The OpuAA N‐terminal domain is predicted with a high pAE to the C‐terminal region. The crosslinks corresponding to these interdomain distances are also violated, indicating flexibility between these two domains. Left: Self crosslinks in gray and heteromeric crosslinks in orange; center: satisfied crosslinks (< 30 Å Cα–Cα) in blue and violated crosslinks in red; right: predicted aligned error plot.

Figure 4
Figure 4. Building complexes from binary interaction predictions
  1. All dimeric PPIs with predicted ipTM > 0.65, which form connected groups of only three proteins are shown.

  2. The 33 candidate 1:1:1 trimer PPIs modeled by AlphaFold‐Multimer (version 2.2.1) distribute over the full pTM and ipTM range (inset). Trimers with an ipTM > 0.80 are labeled.

  3. Selected predicted structures of trimeric complexes with ipTM > 0.80 and their associated PAE plots. Crosslinks are visualized on LutA‐LutB‐LutC; and satisfied crosslinks (< 30 Å Cα–Cα) in blue and violated crosslinks in red.

Figure 5
Figure 5. PdhI/YneR is an inhibitor of the E1 subunit of the pyruvate dehydrogenase
  1. Homology model of B. subtilis E1 pyruvate dehydrogenase (PDH) based on the Geobacillus stearothermophilus E1p structure (PDB id 3dv0; Pei et al, 2008) in surface representation. The space‐fill model of pyruvate is located in the active site based on the template structure. The E1 PDH is a dimer of dimers of the PdhA and PdhB subunits, with the active site formed at the interface between a PdhA and a PdhB copy.

  2. Mapping of crosslinks onto the E1 PDH model derived from combining AlphaFold‐Multimer models. Satisfied crosslinks (< 30 Å Cα–Cα) in blue and violated crosslinks in red.

  3. AlphaFold‐Multimer predictions for PdhA‐PdhB‐PdhI/YneR. The top‐ranked solution by ipTM (0.89) describes the PdhA‐PdhB subcomplex that does not make up the active site, while the nineth‐ranked solution (0.81) identifies the active site interface. Crosslinking data clarify the interactions between PdhI/YneR and PdhA/B. Pyruvate and 3‐deaza‐TdHP are shown as space‐fill models. Crosslink coloring as in B.

  4. Circle view of crosslinking MS data mapped onto the E1 PDH‐PdhI/YneR model derived by combining AlphaFold solutions onto the known stoichiometry. Satisfied crosslinks (< 30 Å Cα–Cα) in blue and violated crosslinks in red.

  5. PDH‐PdhI/YneR model constructed from AlphaFold‐Multimer models of the PdhA‐PdhB‐PdhI/YneR trimer. PdhI/YneR binds at the pocket opening onto the active site.

  6. Visualization of the active site in the AlphaFold‐Multimer model (solid cartoon) with ligand positions derived PDB id 3dv0 (transparent cartoon and sticks). PdhI/YneR occludes the entrance to the active site by inserting Y31 into the pocket used for entrance of the lipoate co‐factor that comes to reduce the thiamine ring in the enamine‐ThDP intermediate. The original structure was solved in the presence of the enamine‐ThDP analog 3‐deaza‐TdHP (Pei et al, 2008). Key residues for ligand coordination are predicted in the same conformation by AlphaFold‐Multimer.

  7. CoFrac‐MS data showing co‐elution of PdhA, PdhB, and PdhI/YneR. The shaded area corresponds to the standard deviation between replicas.

  8. Growth curves on glucose and pyruvate. Growth experiment of wild‐type (blue) B. subtilis, PdhI/YneR overexpression (green), and PdhI/YneR knockout ΔyneR (red) in MSSM minimal medium with 5 mM KCl comparing growth on either glucose or pyruvate as a sole carbon source. Empty vector control in orange. Mutations in residues involved in PdhI binding to PdhA/PdhB lead to phenotypic recovery. Lines represent the mean. The shaded area corresponds to 95% confidence intervals (N = 3 biological replicas).

Comment in

References

    1. Akdel M, Pires DEV, Pardo EP, Jänes J, Zalevsky AO, Mészáros B, Bryant P, Good LL, Laskowski RA, Pozzati G et al (2022) A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 29: 1056–1067 - PMC - PubMed
    1. Andreeva A, Kulesha E, Gough J, Murzin AG (2020) The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res 48: D376–D382 - PMC - PubMed
    1. Ashiuchi M, Nawa C, Kamei T, Song JJ, Hong SP, Sung MH, Soda K, Misono H (2001) Physiological and biochemical characteristics of poly gamma‐glutamate synthetase complex of Bacillus subtilis . Eur J Biochem 268: 5321–5328 - PubMed
    1. Ban N, Beckmann R, Cate JHD, Dinman JD, Dragon F, Ellis SR, Lafontaine DLJ, Lindahl L, Liljas A, Lipton JM et al (2014) A new system for naming ribosomal proteins. Curr Opin Struct Biol 24: 165–169 - PMC - PubMed
    1. Bludau I, Frank M, Dörig C, Cai Y, Heusel M, Rosenberger G, Picotti P, Collins BC, Röst H, Aebersold R (2021) Systematic detection of functional proteoform groups from bottom‐up proteomic datasets. Nat Commun 12: 3810 - PMC - PubMed

Publication types