Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 7;9(1):548.
doi: 10.1038/s41597-022-01631-9.

PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications

Affiliations

PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications

Divya B Korlepara et al. Sci Data. .

Abstract

Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Protocol for input preparation and simulations.
Fig. 2
Fig. 2
Correlation plots between the experimental and calculated binding affinities for a subset with 2000 pdbids. The binding affinities are calculated (a) using Auto-dock Vina, and (b) using MM-PBSA.
Fig. 3
Fig. 3
Prediction of binding affinity based on correlation with experimental data: FDA approved drugs for HIV-I protease targets (a) Experimental vs Docking, (b) Experimental vs MM-PBSA; For Tuberculosis targets - (c) Experimental vs Docking (d) Experimental vs MM-PBSA.
Fig. 4
Fig. 4
Pearson correlation coefficient after training OnionNet on PLAS-5k database.

References

    1. Kairys V, Baranauskiene L, Kazlauskiene M, Matulis D, Kazlauskas E. Binding affinity in drug design: experimental and computational techniques. Expert opinion on drug discovery. 2019;14:755–768. doi: 10.1080/17460441.2019.1623202. - DOI - PubMed
    1. Srivastava HK, Sastry GN. Molecular dynamics investigation on a series of hiv protease inhibitors: assessing the performance of mm-pbsa and mm-gbsa approaches. Journal of chemical information and modeling. 2012;52:3088–3098. doi: 10.1021/ci300385h. - DOI - PubMed
    1. Kimber TB, Chen Y, Volkamer A. Deep learning in virtual screening: Recent applications and developments. International Journal of Molecular Sciences. 2021;22:4435. doi: 10.3390/ijms22094435. - DOI - PMC - PubMed
    1. Mordalski S, Kosciolek T, Kristiansen K, Sylte I, Bojarski AJ. Protein binding site analysis by means of structural interaction fingerprint patterns. Bioorganic & medicinal chemistry letters. 2011;21:6816–6819. doi: 10.1016/j.bmcl.2011.09.027. - DOI - PubMed
    1. Da C, Kireev D. Structural protein–ligand interaction fingerprints (splif) for structure-based virtual screening: method and benchmark study. Journal of chemical information and modeling. 2014;54:2555–2561. doi: 10.1021/ci500319f. - DOI - PMC - PubMed