Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 25;25(5):bbae381.
doi: 10.1093/bib/bbae381.

Machine learning-assisted substrate binding pocket engineering based on structural information

Affiliations

Machine learning-assisted substrate binding pocket engineering based on structural information

Xinglong Wang et al. Brief Bioinform. .

Abstract

Engineering enzyme-substrate binding pockets is the most efficient approach for modifying catalytic activity, but is limited if the substrate binding sites are indistinct. Here, we developed a 3D convolutional neural network for predicting protein-ligand binding sites. The network was integrated by DenseNet, UNet, and self-attention for extracting features and recovering sample size. We attempted to enlarge the dataset by data augmentation, and the model achieved success rates of 48.4%, 35.5%, and 43.6% at a precision of ≥50% and 52%, 47.6%, and 58.1%. The distance of predicted and real center is ≤4 Å, which is based on SC6K, COACH420, and BU48 validation datasets. The substrate binding sites of Klebsiella variicola acid phosphatase (KvAP) and Bacillus anthracis proline 4-hydroxylase (BaP4H) were predicted using DUnet, showing high competitive performance of 53.8% and 56% of the predicted binding sites that critically affected the catalysis of KvAP and BaP4H. Virtual saturation mutagenesis was applied based on the predicted binding sites of KvAP, and the top-ranked 10 single mutations contributed to stronger enzyme-substrate binding varied while the predicted sites were different. The advantage of DUnet for predicting key residues responsible for enzyme activity further promoted the success rate of virtual mutagenesis. This study highlighted the significance of correctly predicting key binding sites for enzyme engineering.

Keywords: acid phosphatase; deep learning; proline 4-hydroxylase; substrate binding sites.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1
Figure 1
Architecture of DUnet for protein/enzyme binding sites prediction. Architecture of DUnet for protein–ligand binding sites prediction. The protein structure was prepared as a 3D image with a size of 36 × 36 × 36, and 18 features were used to describe the atomic characteristics of the protein. The network integrated DenseNet, UNet, and self-attention for feature extraction and image size recovery, respectively, the output image size was 36 × 36 × 36, and only one feature was used to describe if the atom was the binding site or not. Convolution block comprised of Conv3d and BatchNorm3d, ConvTranspose block comprised of ConvTranspose3d, and SA represented self-attention, which learned features from the two given objects (shown by dashed line), through concatenation to combine the features from the upper block. A steric representation of protein and the predicted binding sites (shown as density) was shown on the right.
Figure 2
Figure 2
Comparison of DL-based methods. SR of precision ≥50% (A) and DCC ≤ 4 Å (B) and (C) the average precision, sensitivity, and specificity while these DL-based methods predicted at least one pocket. Noted that the output of PointSite and BiRDs were binding sites within proteins and differ from the calculated density of where ligand exists, we thus did not calculate the SR-DCC of the two methods.
Figure 3
Figure 3
Validation of the predicted binding sites affected enzyme activity. The predicted binding sites of KvAP (A) and BaP4H (C), the predicted binding sites using PUResNet, PointSite, and DUnet were indicated in figure (detailed information provided in Tables S2 and S3). The specific activity changes while the predicted binding sites mutated to Ala based on KvAP (B) and BaP4H (D).
Figure 4
Figure 4
Validation of virtual mutagenesis. (A) Docking p-NPP into KvAP using Rosetta dock. The common predicted binding sites by PUResNet, PointSite, and DUnet are shown. The top-ranked 10 ddG achieved by virtual mutagenesis based on the predicted binding sites by DUnet (B) and PUResNet (C), and the correlated activity changes upon single mutation based on DUnet (B) and PUResNet (C) results.
Figure 5
Figure 5
The protocol of DUnet-assisted molecular docking. The 3D structure of target enzyme can be obtained from PDB or AlphaFold database, followed by binding sites prediction using DUnet. The obtained center of the binding sites is used for accommodating substrate by Gromacs Editconf module and combined with structure file for molecular docking using Rosetta dock. The resulted binding score can be used to evaluate the potential interaction between novel enzymes and desired substrate.

References

    1. Walker SP, Yallapragada VVB, Tangney M. Arming yourself for the in silico protein design revolution. Trends Biotechnol 2021;39:651–64. 10.1016/j.tibtech.2020.10.003. - DOI - PubMed
    1. Dauparas J, Anishchenko I, Bennett N. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 2022;378:49–56. 10.1126/science.add2187. - DOI - PMC - PubMed
    1. Jumper J, Evans R, Pritzel A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583–9. 10.1038/s41586-021-03819-2. - DOI - PMC - PubMed
    1. Baek M, DiMaio F, Anishchenko I. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021;373:871–6. 10.1126/science.abj8754. - DOI - PMC - PubMed
    1. Park H, Bradley P, GreisenP, Jr. et al. Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. J Chem Theory Comput 2016;12:6201–12. 10.1021/acs.jctc.6b00819. - DOI - PMC - PubMed