Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 9;15(1):8724.
doi: 10.1038/s41467-024-52951-w.

Unmasking AlphaFold to integrate experiments and predictions in multimeric complexes

Affiliations

Unmasking AlphaFold to integrate experiments and predictions in multimeric complexes

Claudio Mirabello et al. Nat Commun. .

Abstract

Since the release of AlphaFold, researchers have actively refined its predictions and attempted to integrate it into existing pipelines for determining protein structures. These efforts have introduced a number of functionalities and optimisations at the latest Critical Assessment of protein Structure Prediction edition (CASP15), resulting in a marked improvement in the prediction of multimeric protein structures. However, AlphaFold's capability of predicting large protein complexes is still limited and integrating experimental data in the prediction pipeline is not straightforward. In this study, we introduce AF_unmasked to overcome these limitations. Our results demonstrate that AF_unmasked can integrate experimental information to build larger or hard to predict protein assemblies with high confidence. The resulting predictions can help interpret and augment experimental data. This approach generates high quality (DockQ score > 0.8) structures even when little to no evolutionary information is available and imperfect experimental structures are used as a starting point. AF_unmasked is developed and optimised to fill incomplete experimental structures (structural inpainting), which may provide insights into protein dynamics. In summary, AF_unmasked provides an easy-to-use method that efficiently integrates experiments to predict large protein complexes more confidently.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Flowchart of AF_unmasked.
Structural data coming from an experiment is converted to the correct format (mmCIF), then the template is aligned, either by sequence against the target sequences or structurally against a set of target monomeric structures. This ensures that the template coordinates are applied to the right target amino acids, even when templates are remote homologs. At this stage, MSAs may be clipped to reduce their size and increase the influence of templates on the output. The templates are inputted in AlphaFold along with the evolutionary inputs (MSAs). In the “Template masking" close-up schema, we show changes made to the neural network in AlphaFold: by default, monomeric templates from unrelated structures have incorrect cross-chain distances, so the cross-chain masking, which is by default enforced in AlphaFold-Multimer (Template masking block, Masked track). Coordinates from different templates are merged in single Distograms (dg). Distograms contain information about distances within each monomer (dgA, dgB) as well as cross-chain distances (dgABdgBA). Cross-chain distances are filtered with a masking layer so that, in the final distogram, they are ignored by the neural network. When multimeric templates from experimental datasets are used, the distances across chains are correct and informative, so we disable the template masking (Unmasked track). The neural network then is informed by distances across chains as well as within chains.
Fig. 2
Fig. 2. Box plot comparison of various template strategies when predicting a subset of the PDB set of heterodimeric complexes.
Each box represents the inter-quartile range (IQR), with the median represented as a horizontal line. Whiskers extend to up to 1.5  × IQR beyond the box. Diamonds represent outlier samples. The subset in this test is made of heterodimers (n = 28) where good homologous templates could be found in the PDB and the predictions by AlphaFold-Multimer (Standard) are incorrect. We evaluate AF_unmasked on ideal, native templates without and with cross-chain restraints (Masked and Unmasked, respectively). Then we switch from ideal to homologous templates (Unmasked-Homologs). Only one the top-ranked prediction by ranking confidence, out of 25, is evaluated for each heterodimer. Though results are slightly worse than when providing an ideal template, the cross-chain information from homologous templates helps making better predictions than on Standard and Masked predictors.
Fig. 3
Fig. 3. CASP15 target H1142 is an antibody-antigen complex.
The template was obtained by superimposing unbound structures from CASP15 predictions onto the native to simulate an imperfect template. a In this case, some of the residues at the interface are clashing in the template. We test AF_unmasked either by feeding this imperfect template (a) or by deleting the clashing interfacial residues to let AF_unmasked inpaint them (b). Results show that (c) using both cross- and intra-chain restraints (Unmasked) from the imperfect template does not perform as well as using cross-chain restraints alone (Unmasked, cross-chain). The best overall strategy is to delete the clashes and perform inpainting (Unmasked, inpainting), which results in more extensive sampling of the space of conformations. Regardless of the strategy, the best model by ranking confidence was also the best model by DockQ.
Fig. 4
Fig. 4. CASP15 target H1111 is a very large complex (27 chains, 8460 amino acids) of a secretion export gate from Yersinia enterocolitica.
We use the CASP15 native structure (PDB ID: 7QIJ) as partial template (bottom ring) to guide the assembly and let AlphaFold inpaint the trans-membrane region. The top three models by ranking confidence are all near-identical to the template in the area covered by it, while the trans-membrane region show diverse and potentially biologically relevant conformations: closed (a), intermediate (b), open (c).
Fig. 5
Fig. 5. CASP15 target T1110o is a homodimer of the isocyanide hydratase.
a Target T1109o is a mutant of T1110o where a single-point mutation causes a rearrangement of the C-termini (b). The template was obtained by homology against the PDB, and among a set of candidates we selected a template where the C-termini loops were missing entirely (a, b). We utilise this template as is, the mapping between target and template amino acid sequences was performed by structural superposition between unbound models and the template with TM-align. Using this incomplete template allows AF_unmasked to perform sampling of a number of different loop conformations through inpainting. The top-ranking structures by confidence score show the correct loop arrangement both in T1109o (c, Unmasked) and T1110o (d, Unmasked) for mutant and wildtype sequences, while the default template strategy (Masked) tends to assign to the mutant the same arrangement as in the wildtype.
Fig. 6
Fig. 6. Comparison of AF_unmasked and standard AlphaFold-Multimer predictions of chimeric rubisco protein.
Flexible loops in the smaller subunit at the center of the complex have been inpainted with AF_unmasked. a Global superposition of best standard AlphaFold-Multimer (v2.3) prediction by ranking confidence on the experimental cryo-EM structure (left) and comparison with the best AF_unmasked prediction (right). The circled area highlights how the inner loops are predicted in a tighter and symmetrical conformation compared to the experimental model. The AF_unmasked model, where the same inner loops were inpainted, shows better agreement with the experimental model. b Comparison of predictions against the density obtained from cryo-EM data after optimisation of the superposition between predicted loops and one loop from the deposited model. The circled area shows a cross-section of one of the inner loops of interest. The resulting inpainted loop fits better within the density and is a closer match to the final refined model when compared to the standard prediction.
Fig. 7
Fig. 7. Analysis of ClpB hexamer using AF_unmasked.
a Given template and inpainted N-termini. N-termini are shown as surfaces while other domains are as cartoon. Each subunit of the hexamer is coloured differently. b Inpainting on the M-domains, shown as surfaces. The arrow shows the possible motion of the M-domain. c Inpainting of the interaction between ClpB and casein. The asterisk shows the newly predicted interaction area. d View of the hydrophobic regions of ClpB termini interacting with casein. e Models of ClpB and casein and relative confidence scores.
Fig. 8
Fig. 8. Analysis of the Neurofibromin (NF1) dimer using AF_unmasked.
a Inpainting of the GRD and Sec14-PH domain of NF1 in an intermediate conformation in between the experimentally observed closed and open states. b Comparison of the closed experimental conformation of NF1 isoform 2 (on the left) with a AF_unmasked conformation. Important regions are highlighted. c Superimposition of several AF_unmasked predictions where the GRD and Sec14-PH domain were modelled in intermediate positions, suggesting a possible motion path for these domains. d Three different AF_unmasked predictions showing a bending of the helical NF1 platform, also represented with differently coloured curved lines.

References

    1. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature596, 583–589 (2021). - PMC - PubMed
    1. Roney, J. P. & Ovchinnikov, S. State-of-the-art estimation of protein model accuracy using AlphaFold. Phys. Rev. Lett.129, 238101 (2022). - PubMed
    1. Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. BioRxiv10.1101/2021.10.04.463034 (2021).
    1. Wallner, B. AFsample: Improving multimer prediction with alphafold using massive sampling. Bioinformatics39, btad573 (2023). - PMC - PubMed
    1. Committee, C. CASP15: Book of Abstracts.https://predictioncenter.org/casp15/doc/CASP15_Abstracts.pdf (2022).

Publication types