Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 11:13:1600945.
doi: 10.3389/fchem.2025.1600945. eCollection 2025.

Two-dimensional QSAR-driven virtual screening for potential therapeutics against Trypanosoma cruzi

Affiliations

Two-dimensional QSAR-driven virtual screening for potential therapeutics against Trypanosoma cruzi

Naseer Maliyakkal et al. Front Chem. .

Abstract

Trypanosoma cruzi is the cause of Chagas disease (CD), a major health issue that affects 6-7 million individuals globally. Once considered a local problem, migration and non-vector transmission have caused it to spread. Efforts to eliminate CD remain challenging due to insufficient awareness, inadequate diagnostic tools, and limited access to healthcare, despite its classification as a neglected tropical disease (NTD) by the WHO. One of the foremost concerns remains the development of safer and more effective anti-Chagas therapies. In our study, we developed a standardized and robust machine learning-driven QSAR (ML-QSAR) model using a dataset of 1,183 Trypanosoma cruzi inhibitors curated from the ChEMBL database to speed up the drug discovery process. Following the calculation of molecular descriptors and feature selection approaches, Support Vector Machine (SVM), Artificial Neural Network (ANN), and Random Forest (RF) models were developed and optimized to elucidate and predict the inhibition mechanism of novel inhibitors. The ANN-driven QSAR model utilizing CDK fingerprints exhibited the highest performance, proven by a Pearson correlation coefficient of 0.9874 for the training set and 0.6872 for the test set, demonstrating exceptional prediction accuracy. Twelve possible inhibitors with pIC50 ≥ 5 were further identified through screening of large chemical libraries using the ANN-QSAR model and ADMET-based filtering approaches. Molecular docking studies revealed that F6609-0134 was the best hit molecule. Finally, the stability and high binding affinity of F6609-0134 were further validated by molecular dynamics simulations and free energy analysis, bolstering its continued assessment as a possible treatment option for Chagas disease.

Keywords: Chagas disease; Trypanosoma cruzi; artificial neural network; machine learning; molecular docking; molecular dynamics; quantitative structure activity relationships; virtual screening.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
(A,B) represent regression plots for the training and test datasets for the best CDK fingerprint-driven ANN-QSAR model, respectively; (C,D) represent regression plots for the training and test datasets for the best atom 2D pair fingerprint-driven RF-QSAR model.
FIGURE 2
FIGURE 2
Representation of top 20 features from VIP plot (A) and Pearson correlation plot (B) for the CDK fingerprint-driven ANN-QSAR model.
FIGURE 3
FIGURE 3
Representation of top features from SHAP analysis (A) and outlier analysis through PCA plot (B) for the CDK fingerprint-driven ANN-QSAR model.
FIGURE 4
FIGURE 4
Representation of the top 20 features from VIP plot (A) and Pearson correlation plot (B) for the atom 2D pair fingerprint-driven RF-QSAR model.
FIGURE 5
FIGURE 5
Representation of top features from SHAP analysis (A) and outlier analysis through PCA plot (B) for the atom 2D pair fingerprint-driven RF-QSAR model.
FIGURE 6
FIGURE 6
Representation of top features through Tanimoto similarity-driven cluster analysis of highly active (A) and low active molecules (B) from the CDK fingerprint dataset; highly active (C) and low active (D) molecules from the atom 2D pair fingerprint dataset.
FIGURE 7
FIGURE 7
2-D and 3D interaction of Lead compound F6609-0134 with binding pocket of 1ME3.
FIGURE 8
FIGURE 8
Analysis of the inhibitor-ligand complex using MD simulation: RMSD plot (co-crystallized ligand RMSD is shown in orange, and RMSD of F6609-0134 is shown in green); RMSF plot (co-crystallized ligand RMSF is shown in orange, and RMSF of F6609-0134 is shown in green); and analysis of protein-ligand contacts of the MD trajectory of the F6609-0134-1ME3 complex.
FIGURE 9
FIGURE 9
Secondary structure element (SSE) distribution plotted against residue index for 1ME3. The SSE composition across each trajectory frame throughout the simulation for 1ME3.
FIGURE 10
FIGURE 10
PCA of F6609-0134-1ME3 protein-ligand complex.

Similar articles

References

    1. Abras A., Ballart C., Fernández-Arévalo A., Pinazo M.-J., Gascón J., Muñoz C., et al. (2022). Worldwide control and management of Chagas disease in a new era of globalization: a close look at congenital Trypanosoma cruzi infection. Clin. Microbiol. Rev. 35 (2), e0015221. 10.1128/cmr.00152-21 - DOI - PMC - PubMed
    1. Baldi A. (2010). Computational approaches for drug design and discovery: an overview. Syst. Rev. Pharm. 1 (1), 99. 10.4103/0975-8453.59519 - DOI
    1. Bayat Mokhtari R., Homayouni T. S., Baluch N., Morgatskaya E., Kumar S., Das B., et al. (2017). Combination therapy in combating cancer. Oncotarget 8 (23), 38022–38043. 10.18632/oncotarget.16723 - DOI - PMC - PubMed
    1. Breiman L. (2001). Random forests. Mach. Learn. 45 (1), 5–32. 10.1023/A:1010933404324 - DOI
    1. Cao D., Deng Z., Zhu M., Yao Z., Dong J., Zhao R. (2017). Ensemble partial least squares regression for descriptor selection, outlier detection, applicability domain assessment, and ensemble modeling in QSAR/QSPR modeling. J. Chemom. 31 (11). 10.1002/cem.2922 - DOI

LinkOut - more resources