Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 15;13(3):862-875.
doi: 10.1021/acssynbio.3c00674. Epub 2024 Feb 15.

Carving out a Glycoside Hydrolase Active Site for Incorporation into a New Protein Scaffold Using Deep Network Hallucination

Affiliations

Carving out a Glycoside Hydrolase Active Site for Incorporation into a New Protein Scaffold Using Deep Network Hallucination

Anders Lønstrup Hansen et al. ACS Synth Biol. .

Abstract

Enzymes are indispensable biocatalysts for numerous industrial applications, yet stability, selectivity, and restricted substrate recognition present limitations for their use. Despite the importance of enzyme engineering in overcoming these limitations, success is often challenged by the intricate architecture of enzymes derived from natural sources. Recent advances in computational methods have enabled the de novo design of simplified scaffolds with specific functional sites. Such scaffolds may be advantageous as platforms for enzyme engineering. Here, we present a strategy for the de novo design of a simplified scaffold of an endo-α-N-acetylgalactosaminidase active site, a glycoside hydrolase from the GH101 enzyme family. Using a combination of trRosetta hallucination, iterative cycles of deep-learning-based structure prediction, and ProteinMPNN sequence design, we designed proteins with 290 amino acids incorporating the active site while reducing the molecular weight by over 100 kDa compared to the initial endo-α-N-acetylgalactosaminidase. Of 11 tested designs, six were expressed as soluble monomers, displaying similar or increased thermostabilities compared to the natural enzyme. Despite lacking detectable enzymatic activity, the experimentally determined crystal structures of a representative design closely matched the design with a root-mean-square deviation of 1.0 Å, with most catalytically important side chains within 2.0 Å. The results highlight the potential of scaffold hallucination in designing proteins that may serve as a foundation for subsequent enzyme engineering.

Keywords: de novo design; deep network hallucination; enzyme design; glycoside hydrolase.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
GH101 scaffold design using trRosetta hallucination and iterative sequence clustering. (A) Active site containing structure (red) of the EngSP (PDB 5A59) enzyme. The cutout shows the substrate bound in the active site with residues receiving a high restraint weight shown using atom spheres. (B) Scheme illustrating construction of the restraint map used to guide the trRosetta hallucination, the MCMC optimization, and the clustering step used to explore sequence- and motif-space. (C) Comparison of target (top) and design (bottom) topologies and restraint map. Yellow arrows indicate the β-strand, and blue boxes indicate the α-helix structure. Loss function weight of structural restraints is indicated by color with light, medium, and dark red representing low, medium, and high weight, respectively. Triangles indicate that the residue type was fixed to the WT type. (D) Principal component analysis of MSAs of all hallucinated sequences. The initial 136 sequences are labeled gray with the derived design generation following a color gradient. Yellow arrows illustrate the “evolutionary” path of the two selected clusters. (E) Example hallucinated design illustrating alternative solutions (red) to unrestrained regions superpositioned with the EngSP (PDB 5A59) catalytic domain.
Figure 2
Figure 2
RMSF (blue) of selected hEngBF hallucinated designs Cα atoms sampled by molecular dynamics simulation and the pLDDT (black) from the OmegaFold predicted structural models plotted as a function of the residue number.
Figure 3
Figure 3
ProteinMPNN sequence redesign. (A) Schematic representation of the iterative sequence redesign using the ProteinMPNN algorithm. Starting from the hEngBF2 hallucinated design model successive rounds and evaluation of the ProteinMPNN output was performed. (B) Histograms of pLDDT scores from OmegaFold structure predictions of hallucinated and ProteinMPNN redesigned sequences. Each histogram represents the distribution of pLDDT scores for a given round with mean values of 60 for the hallucinated sequence and 72, 80, and 84 for iterations 1 –3 of the ProteinMPNN sequence redesign, respectively. (C) Principal component analysis of MSAs of all hallucinated and ProteinMPNN redesigned sequences. Each point on the plot represents a sequence, and the distance between points indicates how similar the sequences are. The color of each point indicates the iteration of sequence redesign (light gray: hallucinated sequences, dark gray: iteration 1, Blue: iteration 2, yellow: iteration 3). (D) OmegaFold structure prediction of representative sequences from iteration 1 (left panel), 2 (middle panel), and 3 (right panel) of the ProteinMPNN sequence redesign, superimposed with the OmegaFold structural model of hEngBF2 (gray). The pLDDT prediction confidence scale is indicated on the structure, ranging from 50 (yellow) to 100 (blue). The RMSD is for the predicted structure of the representative sequence with respect to the hEngBF2 model.
Figure 4
Figure 4
Experimental validation of hallucinated and ProteinMPNN redesigned sequences. (A) Schematic of the sequence design process, starting with the hallucinated sequence and followed by a ProteinMPNN sequence redesign. The 11 designed sequences are labeled 1–11 and color-coded based on the round of redesign, ranging from iteration 1–3. The edges of the circles are color coded based on protein expression, with green indicating detectable protein expression and red indicating no detectable protein expression. (B) Quantification of total soluble yield per liter of culture for purified protein designs from size exclusion chromatography elution profiles. (C) Size-exclusion chromatography profiles of purified dEngBF4, dEngBF5, dEngBF8, and dEngBF9, eluted in 10 mM sodium phosphate pH 7.4, 10 mM NaCl using a Superdex 200 26/60 column. Gray shading represents the elution profile and the fractions collected.
Figure 5
Figure 5
Biophysical and in silico characterizations of dEngBF designs. The name of the designed protein for which data are presented in panels A–D is listed on top. (A) RMSF of dEngBF models Cα atoms sampled by molecular dynamics simulation and plotted on the OmegaFold predicted structural models. (B) Far-UV CD spectroscopy of dEngBF designs at 20 °C (dark gray), 85 °C (blue), and cooled back to 20 °C (yellow). MRE is the mean residue ellipticity (eq 1). (C) Differential scanning calorimetry thermograms of dEngBF design unfolding. The thermograms show the unfolding of the designed proteins as a function of temperature. (D) Size-exclusion chromatography profiles of dEngBF designs at 20 °C before (dark gray) and after (blue) thermal unfolding.
Figure 6
Figure 6
(A) OmegaFold pLDDT prediction confidence plotted on the structures of dEngBF designs. (B) CD spectroscopy thermal denaturation of dEngBF designs followed at 280 nm. (C) The denaturant concentration was plotted as a function of the design melting temperature. The Tm at [GuHCl] = 0 is extrapolated from a linear fit.
Figure 7
Figure 7
(A) Structure of dEngBF4 solved by X-ray crystallography (molecule A of PDB: 8QYE). Unrestrained segments during the design phase are depicted in red. Zoomed insets emphasize areas that either displayed limited electron density in the crystal structure or exhibited pronounced dynamics in molecular dynamics simulations. (B) Plots depicting dEngBF4 per-residue Cα RMSF sampled from molecular dynamics simulations (top panel), OmegaFold pLDDT prediction confidence (middle panel), and distances between experimental and structural model Cα atoms (bottom panel).
Figure 8
Figure 8
(A) Superposition of the dEngBF4 OmegaFold model (gray) and the crystal structure (green). The backbone Cα RMSD was 1.0 Å over 191 amino acids. The zoomed inset showcases catalytically essential amino acids, subject to both structural and sequence restraint in the design process. The side chain RMSD for catalytically relevant residues in the OmegaFold model and the defined in the crystal structure (Glu188, Asp156, His92, and Asn94) was 1.2 Å over 31 atoms. (B) Superposition of EngBF (PDB ID: 2ZXQ) (gray) with the restrained regions colored red and the dEngBF4 crystal structure (green). The backbone Cα RMSD was 2.6 Å for 115 amino acids. Zoomed insert depicts catalytically essential residues for which both structural and amino acid type restraint were applied in the design process. The side chain RMSD for catalytically relevant residues defined in the dEngBF4 crystal structure (Glu188, Asp156, His92, and Asn94) was 2.0 Å over 34 atoms. The EngBF numbering is indicated in parentheses.

References

    1. Leveson-Gower R. B.; Mayer C.; Roelfes G. The Importance of Catalytic Promiscuity for Enzyme Design and Evolution. Nat. Rev. Chem 2019, 3, 687–705. 10.1038/s41570-019-0143-x. - DOI
    1. Khersonsky O.; Tawfik D. S. Enzyme Promiscuity: A Mechanistic and Evolutionary Perspective. Annu. Rev. Biochem. 2010, 79, 471–505. 10.1146/annurev-biochem-030409-143718. - DOI - PubMed
    1. Savile C. K.; Janey J. M.; Mundorff E. C.; Moore J. C.; Tam S.; Jarvis W. R.; Colbeck J. C.; Krebber A.; Fleitz F. J.; Brands J.; Devine P. N.; Huisman G. W.; Hughes G. J. Biocatalytic Asymmetric Synthesis of Chiral Amines from Ketones Applied to Sitagliptin Manufacture. Science 2010, 329 (5989), 305–309. 10.1126/science.1188934. - DOI - PubMed
    1. Huffman M. A.; Fryszkowska A.; Alvizo O.; Borra-Garske M.; Campos K. R.; Canada K. A.; Devine P. N.; Duan D.; Forstater J. H.; Grosser S. T.; Halsey H. M.; Hughes G. J.; Jo J.; Joyce L. A.; Kolev J. N.; Liang J.; Maloney K. M.; Mann B. F.; Marshall N. M.; McLaughlin M.; Moore J. C.; Murphy G. S.; Nawrat C. C.; Nazor J.; Novick S.; Patel N. R.; Rodriguez-Granillo A.; Robaire S. A.; Sherer E. C.; Truppo M. D.; Whittaker A. M.; Verma D.; Xiao L.; Xu Y.; Yang H. Design of an in Vitro Biocatalytic Cascade for the Manufacture of Islatravir. Science 2019, 366 (6470), 1255–1259. 10.1126/science.aay8484. - DOI - PubMed
    1. Lovelock S. L.; Crawshaw R.; Basler S.; Levy C.; Baker D.; Hilvert D.; Green A. P. The Road to Fully Programmable Protein Catalysis. Nature 2022, 606 (7912), 49–58. 10.1038/s41586-022-04456-z. - DOI - PubMed

Publication types

MeSH terms

Substances