Synergy of advanced machine learning and deep neural networks with consensus molecular docking for virtual screening of anaplastic lymphoma kinase inhibitors
- PMID: 40952529
- DOI: 10.1007/s10822-025-00657-6
Synergy of advanced machine learning and deep neural networks with consensus molecular docking for virtual screening of anaplastic lymphoma kinase inhibitors
Abstract
This study addresses the urgent need for an AI model to predict Anaplastic Lymphoma Kinase (ALK) inhibitors for Non-Small Cell Lung Cancer treatment, targeting the ALK-positive mutation. With only five Food and Drug Administration approved ALK inhibitors currently available, effective drugs remain in demand. Leveraging machine learning (ML) and deep learning (DL), our research accelerates the precise screening of novel ALK inhibitors using both ligand-based and structure-based approaches. In ligand-based approach, an ensemble voting model comprising three base learners to classify potential ALK inhibitors, achieving promising retrospective validation results. Notably, the ML-based XGBoost algorithm exhibited compelling results with external validation (EV)-f1 score of 0.921, EV-Average Precision (AP) of 0.961, cross-validation (CV)-f1 score of [Formula: see text] and CV-AP of [Formula: see text]. Besides, the DL-based Artificial Neural Network (ANN) model demonstrated comparative performance with EV-f1 score of 0.930, EV-AP of 0.955, CV-f1 score of [Formula: see text] and CV-AP of [Formula: see text]. For structure-based approach, an XGBoost consensus docking model utilized scores from three molecular docking programs (GNINA 1.0, Vina-GPU 2.0, and AutoDock-GPU) as features. Combining these two approaches, we virtually screened 120,571 compounds, identifying three promising ALK inhibitors, CHEMBL1689515, CHEMBL2380351, and CHEMBL102714, that bind to the protein's pocket and establish hydrophobic contacts in the hinge region through their ketone groups, resembling Alectinib's interaction. Comparative analysis revealed traditional ML models outperformed Graph Neural Networks (GNN), highlighting the critical role of feature engineering and dataset size importance. The study recommends further in vitro testing to validate the prospective screening performance of these models. A graphical user interface is available at https://huggingface.co/spaces/thechuongtrinh/ALK_inhibitors_classification .
Keywords: Anaplastic lymphoma kinase; Artificial intelligence; Benchmarking; Computer-aided drug design; Consensus molecular docking; Machine learning.
© 2025. The Author(s), under exclusive licence to Springer Nature Switzerland AG.
Conflict of interest statement
Declarations. Conflict of interest: The authors declare no conflict of interest.
References
-
- Observatory G-GC (2020) Journal. https://gco.iarc.fr/ . Accessed 2020
-
- Society AC (2023) Journal. https://www.cancer.org/ . Accessed 2023
-
- Targeted Therapy to Treat Cancer (2023) https://www.cancer.gov/about-cancer/treatment/types/targeted-therapies . Accessed 15 May 2023
MeSH terms
Substances
LinkOut - more resources
Full Text Sources