Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 22:14:1193282.
doi: 10.3389/fphar.2023.1193282. eCollection 2023.

Garbage in, garbage out: how reliable training data improved a virtual screening approach against SARS-CoV-2 MPro

Santiago M Ruatta  1   2 Denis N Prada Gori  3 Martín Fló Díaz  4   5 Franca Lorenzelli  1 Karen Perelmuter  6 Lucas N Alberca  3   7 Carolina L Bellera  3   7 Andrea Medeiros  1   8 Gloria V López  9   10 Mariana Ingold  10 Williams Porcal  9   10 Estefanía Dibello  9 Irina Ihnatenko  11 Conrad Kunick  11 Marcelo Incerti  9 Martín Luzardo  9 Maximiliano Colobbio  9   12 Juan Carlos Ramos  9   12 Eduardo Manta  9   12 Lucía Minini  9 María Laura Lavaggi  13 Paola Hernández  14 Jonas Šarlauskas  15 César Sebastian Huerta García  16 Rafael Castillo  16 Alicia Hernández-Campos  16 Giovanni Ribaudo  17 Giuseppe Zagotto  18 Renzo Carlucci  19 Noelia S Medrán  19 Guillermo R Labadie  19 Maitena Martinez-Amezaga  19 Carina M L Delpiccolo  19 Ernesto G Mata  19 Laura Scarone  9 Laura Posada  9 Gloria Serra  9 Theodora Calogeropoulou  20 Kyriakos Prousis  20 Anastasia Detsi  21 Mauricio Cabrera  22 Guzmán Alvarez  22 Adrián Aicardo  8   23   24 Verena Araújo  8   23   25 Cecilia Chavarría  8   23 Lucija Peterlin Mašič  26 Melisa E Gantner  3   7 Manuel A Llanos  3   7 Santiago Rodríguez  3 Luciana Gavernet  3   7 Soonju Park  27 Jinyeong Heo  27 Honggun Lee  27 Kyu-Ho Paul Park  27 Mariela Bollati-Fogolín  6 Otto Pritsch  4   5 David Shum  27 Alan Talevi  3   7 Marcelo A Comini  1
Affiliations

Garbage in, garbage out: how reliable training data improved a virtual screening approach against SARS-CoV-2 MPro

Santiago M Ruatta et al. Front Pharmacol. .

Abstract

Introduction: The identification of chemical compounds that interfere with SARS-CoV-2 replication continues to be a priority in several academic and pharmaceutical laboratories. Computational tools and approaches have the power to integrate, process and analyze multiple data in a short time. However, these initiatives may yield unrealistic results if the applied models are not inferred from reliable data and the resulting predictions are not confirmed by experimental evidence. Methods: We undertook a drug discovery campaign against the essential major protease (MPro) from SARS-CoV-2, which relied on an in silico search strategy -performed in a large and diverse chemolibrary- complemented by experimental validation. The computational method comprises a recently reported ligand-based approach developed upon refinement/learning cycles, and structure-based approximations. Search models were applied to both retrospective (in silico) and prospective (experimentally confirmed) screening. Results: The first generation of ligand-based models were fed by data, which to a great extent, had not been published in peer-reviewed articles. The first screening campaign performed with 188 compounds (46 in silico hits and 100 analogues, and 40 unrelated compounds: flavonols and pyrazoles) yielded three hits against MPro (IC50 ≤ 25 μM): two analogues of in silico hits (one glycoside and one benzo-thiazol) and one flavonol. A second generation of ligand-based models was developed based on this negative information and newly published peer-reviewed data for MPro inhibitors. This led to 43 new hit candidates belonging to different chemical families. From 45 compounds (28 in silico hits and 17 related analogues) tested in the second screening campaign, eight inhibited MPro with IC50 = 0.12-20 μM and five of them also impaired the proliferation of SARS-CoV-2 in Vero cells (EC50 7-45 μM). Discussion: Our study provides an example of a virtuous loop between computational and experimental approaches applied to target-focused drug discovery against a major and global pathogen, reaffirming the well-known "garbage in, garbage out" machine learning principle.

Keywords: COVID-19; artificial intelligence; coronavirus; drug discovery; in silico screening; protease; rubbish in rubbish out; target-based.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
PPV surface from the first retrospective in silico screening against MPro (First ligand-based modelling campaign).
FIGURE 2
FIGURE 2
Data plots from retrospective and prospective in silico screening against MPro. (A) AUC ROC obtained in the retrospective screening as a function of the number of combined models for each operator. (B) Two different views of the PPV surface of the MIN-22 ensemble.
FIGURE 3
FIGURE 3
Concentration-dependent activity of the identified hits. The inhibitory activity of the selected compounds is shown for (A–C) MPro, (D–F) SARS-CoV-2 infected Vero cells, and (G–I) viability of Vero cells. In orange, the data for the benzofuroxan derivatives [plots (A, D, G)]; in violet, those corresponding to chalcones [plots (B, E, H)]; in green, a glycoside hit; in magenta, a sulphonamide hit; in light blue, different singletons [plots (C, F, I)]. All compounds were tested at ≥7 different concentrations (serial 1/3 or 1/2 dilutions for MPro assays or biological assays, respectively), at least in duplicate. Ebselen (EbSe) was included in all assays as positive control.
FIGURE 4
FIGURE 4
Binding poses predicted by docking for some representative active and inactive compounds reported herein. (A) Most active compounds of the benzofuroxan family: 3d (sky blue), 19d (plum), and 5d (light green) (B) Less active compounds of the benzofuroxan family: 1d (purple), 2d (tan), and 19d (light sea green). (C) Active compounds of the chalcone-related structures: 25d (salmon) and 27d (cornflower blue). (D) Less active compound of the chalcone-related structures: 26d (lime green), an isomer of 27d.

References

    1. Agnihotri G., Tiwari P., Misra A. K. (2005). One-pot synthesis of per-O-acetylated thioglycosides from unprotected reducing sugars. Carbohydr. Res. 340, 1393–1396. 10.1016/j.carres.2005.02.027 - DOI - PubMed
    1. Agnihotri G., Misra A. K. (2005). Fast and selective oxidation of thioglycosides to glycosyl sulfoxides using KF/m-CPBA. Tetrahedron Lett. 46, 8113–8116. 10.1016/j.tetlet.2005.09.132 - DOI
    1. Akshita G., Chitra R., Pradeep P., Viswanathan V., Naval V., Kaur P., et al. (2020). Structure-based virtual screening and biochemical validation to discover a potential inhibitor of the SARS-CoV-2 main protease. ACS Omega 5, 33151–33161. 10.1021/acsomega.0c04808 - DOI - PMC - PubMed
    1. Alves V. M., Bobrowski T., Melo-Filho C. C., Korn D., Auerbach S., Schmitt C., et al. (2021). QSAR modeling of SARS-CoV mpro inhibitors identifies sufugolix, cenicriviroc, proglumetacin, and other drugs as candidates for repurposing against SARS-CoV-2. Mol. Inf. 40, e2000113. 10.1002/minf.202000113 - DOI - PubMed
    1. Badshah S. L., Faisal S., Muhammad A., Poulson B. G., Emwas A. H., Jaremko M. (2021). Antiviral activities of flavonoids. Biomed. Pharmacother. 140, 111596. 10.1016/j.biopha.2021.111596 - DOI - PMC - PubMed