Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 16;30(14):2985.
doi: 10.3390/molecules30142985.

Integrating Molecular Dynamics, Molecular Docking, and Machine Learning for Predicting SARS-CoV-2 Papain-like Protease Binders

Affiliations

Integrating Molecular Dynamics, Molecular Docking, and Machine Learning for Predicting SARS-CoV-2 Papain-like Protease Binders

Ann Varghese et al. Molecules. .

Abstract

Coronavirus disease 2019 (COVID-19) produced devastating health and economic impacts worldwide. While progress has been made in vaccine development, effective antiviral treatments remain limited, particularly those targeting the papain-like protease (PLpro) of SARS-CoV-2. PLpro plays a key role in viral replication and immune evasion, making it an attractive yet underexplored target for drug repurposing. In this study, we combined machine learning, molecular dynamics, and molecular docking to identify potential PLpro inhibitors in existing drugs. We performed long-timescale molecular dynamics simulations on PLpro-ligand complexes at two known binding sites, followed by structural clustering to capture representative structures. These were used for molecular docking, including a training set of 127 compounds and a library of 1107 FDA-approved drugs. A random forest model, trained on the docking scores of the representative conformations, yielded 76.4% accuracy via leave-one-out cross-validation. Applying the model to the drug library and filtering results based on prediction confidence and the applicability domain, we identified five drugs as promising candidates for repurposing for COVID-19 treatment. Our findings demonstrate the power of integrating computational modeling with machine learning to accelerate drug repurposing against emerging viral targets.

Keywords: SARS-CoV-2; drug repurposing; machine learning; molecular docking; molecular dynamics; papain-like protease.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1
The RMSDs of the structures in the trajectory files from the simulation systems for the PLpro complexes with PDB ID 7LBR (A) and 7QCI (B). The x-axis indicates the time in the simulation. The y-axis gives the RMSD between the structure at the time indicated on the x-axis and the initial structure. The RMSDs calculated for PLpro, the ligand, and the ligand-binding pocket of PLpro are color-coded in blue, red, and green, respectively.
Figure 2
Figure 2
The structure clusters and corresponding representative structures from the MD simulations based on the PLpro complexes with PDB ID 7LBR (A) and 7QCI (B). The y-axis of the bar graph denotes the fraction of structures, and the x-axis indicates the cluster number. The representative structures, marked as R1, R2, and R3, are shown on top of the corresponding bars.
Figure 3
Figure 3
The distribution of RMSD values between the docked poses and the corresponding co-crystallized conformations of the 33 ligands obtained from PDB.
Figure 4
Figure 4
The docking scores of 127 training compounds from the docking to site S4 (A) and site SUb2 (B). The average docking scores of 58 binders and 69 non-binders are plotted as red and blue bars, respectively, for the top five docking poses for each of the six representative structures labelled on the x-axes. The corresponding standard deviations are depicted by the attached sticks. The x-axis labels are given in a combination of representative conformations (C1 to C3) and top poses (P1 to P5).
Figure 5
Figure 5
The matrices of the RMSD values between the 30 docking poses for each of the 33 ligands. Each subfigure corresponds to one ligand, with the ligand name displayed above. The x-axis and y-axis represent the pose indices. The first 15 poses correspond to the S4 binding site and the remaining 15 to the SUb2 site. Within each site, the poses are grouped by the representative protein conformation used for docking: five poses from the first representative structure, followed by five from the second, and five from the third.
Figure 6
Figure 6
The results of LOOCV and 10-fold cross-validations. The x-axis indicates the performance metrics, and the y-axis depicts the metric values. The LOOCV results are presented as the blue bars. The average performance metrics values from the 100 iterations of 10-fold cross-validations are given as the red bars, and their corresponding standard deviations are indicated by the sticks atop. Abbreviations: Sens—sensitivity, Spec—specificity, BA—balanced accuracy, Acc—accuracy, MCC—Matthews’ correlation coefficient, PPV—positive predictive value, NPV—negative predictive value, AUC—area under the receiver operating characteristic curve, and F1—F1 score.
Figure 7
Figure 7
The relationship between prediction confidence and performance in LOOCV. The low-confidence and high-confidence predictions are plotted in the cyan and yellow bars, respectively. The x-axis shows the performance metrics, and the y-axis gives the metric values. Abbreviations: Sens—sensitivity, Spec—specificity, BA—balanced accuracy, Acc—accuracy, and MCC—Matthews’ correlation coefficient.
Figure 8
Figure 8
FDA-approved drugs predicted as PLpro binders. The x-axis represents the prediction confidence value. The y-axis shows the distance to the centroid of the training dataset. The drugs are plotted as circles. The vertical line indicates the prediction confidence value of 0.5. Points to the right of the vertical line represent high-confidence predictions, while those to the left indicate low-confidence predictions. The horizontal line separates the drugs inside and outside the applicability domain: points below the line fall within the applicability domain, whereas points above the line are outside the domain.
Figure 9
Figure 9
Study design. Two PLpro structures were downloaded from the Protein Data Bank (PDB) and subjected to molecular dynamics (MD) simulations. The resulting trajectories were clustered using k-means clustering to generate representative structures. Ligands classified as binders or non-binders were curated from the PDB and the literature, while the structures of FDA-approved drugs were sourced from the LTKB database. All compounds were docked into the representative PLpro structures using Autodock Vina. The docking scores of all representative structures were used to develop a random forest (RF) classification model. The performance of the model was then evaluated using leave-one-out cross-validation (LOOCV). The final RF mode was then used to predict potential PLpro binders from FDA-approved drugs, identifying candidates for possible COVID-19 treatment via drug repurposing.

Similar articles

References

    1. Parvathaneni V., Kulkarni N.S., Muth A., Gupta V. Drug repurposing: A promising tool to accelerate the drug discovery process. Drug Discov. Today. 2019;24:2076–2085. doi: 10.1016/j.drudis.2019.06.014. - DOI - PMC - PubMed
    1. Kulkarni V.S., Alagarsamy V., Solomon V.R., Jose P.A., Murugesan S. Drug Repurposing: An Effective Tool in Modern Drug Discovery. Russ. J. Bioorg. Chem. 2023;49:157–166. doi: 10.1134/S1068162023020139. - DOI - PMC - PubMed
    1. Novac N. Challenges and opportunities of drug repositioning. Trends Pharmacol. Sci. 2013;34:267–272. doi: 10.1016/j.tips.2013.03.004. - DOI - PubMed
    1. Rudrapal M., Khairnar S.J., Jadhav A.G. Drug Repurposing—Hypothesis, Molecular Aspects and Therapeutic Applications. IntechOpen; London, UK: 2020. Drug Repurposing (DR): An Emerging Approach in Drug Discovery.
    1. Bakshi A., Gangopadhyay K., Basak S., De R.K., Sengupta S., Dasgupta A. Integrating state-space modeling, parameter estimation, deep learning, and docking techniques in drug repurposing: A case study on COVID-19 cytokine storm. J. Am. Med. Inform. Assn. 2025:ocaf035. doi: 10.1093/jamia/ocaf035. - DOI - PubMed

MeSH terms

LinkOut - more resources