Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Feb 9;11(1):180.
doi: 10.1038/s41597-023-02872-y.

PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications

Affiliations

PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications

Divya B Korlepara et al. Sci Data. .

Erratum in

Abstract

Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski's rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart corresponding to the system-setup and simulation protocol.
Fig. 2
Fig. 2
Correlation plots between the experimental and calculated binding affinities for a subset with 6622 (includes 2000 data points from PLAS-5k dataset) pdbids. The calculated binding affinities are calculated (a) using Auto-dock Vina, and (b) using MMPBSA.
Fig. 3
Fig. 3
Confusion matrix to distinguish strong and weak binders (a) Experimental vs MMPBSA, (b) Experimental vs Docking.
Fig. 4
Fig. 4
Correlation plots for a set of PDB ids from PLAS-20k (which follows Lipinski’s rule of five - Molecular weight, number of donors and number of acceptors of the ligand) for which experimental binding affinities are known - (a) Experimental vs Docking, (b) Experimental vs MMPBSA.
Fig. 5
Fig. 5
Pearson correlation coefficient of OnionNet trained on PLAS-20k dataset.

Similar articles

Cited by

References

    1. Shim H, Kim H, Allen JE, Wulff H. Pose classification using three-dimensional atomic structure-based neural networks applied to ion channel-ligand docking. Journal of Chemical Information and Modeling. 2022;62:2301–2315. doi: 10.1021/acs.jcim.1c01510. - DOI - PMC - PubMed
    1. Gilson MK, Zhou H-X. Calculation of protein-ligand binding affinities. Annual review of biophysics and biomolecular structure. 2007;36:21–42. doi: 10.1146/annurev.biophys.36.040306.132550. - DOI - PubMed
    1. Osaki K, Ekimoto T, Yamane T, Ikeguchi M. 3d-rism-ai: A machine learning approach to predict protein-ligand binding affinity using 3d-rism. The Journal of Physical Chemistry B. 2022;126:6148–6158. doi: 10.1021/acs.jpcb.2c03384. - DOI - PMC - PubMed
    1. Karthikeyan A, Priyakumar UD. Artificial intelligence: machine learning for chemical sciences. Journal of Chemical Sciences. 2022;134:1–20. doi: 10.1007/s12039-021-01995-2. - DOI - PMC - PubMed
    1. Stokes JM, et al. A deep learning approach to antibiotic discovery. Cell. 2020;180:688–702. doi: 10.1016/j.cell.2020.01.021. - DOI - PMC - PubMed