Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 7;12(1):18935.
doi: 10.1038/s41598-022-23649-0.

Machine learning based personalized drug response prediction for lung cancer patients

Affiliations

Machine learning based personalized drug response prediction for lung cancer patients

Rizwan Qureshi et al. Sci Rep. .

Abstract

Lung cancers with a mutated epidermal growth factor receptor (EGFR) are a major contributor to cancer fatalities globally. Targeted tyrosine kinase inhibitors (TKIs) have been developed against EGFR and show encouraging results for survival rate and quality of life. However, drug resistance may affect treatment plans and treatment efficacy may be lost after about a year. Predicting the response to EGFR-TKIs for EGFR-mutated lung cancer patients is a key research area. In this study, we propose a personalized drug response prediction model (PDRP), based on molecular dynamics simulations and machine learning, to predict the response of first generation FDA-approved small molecule EGFR-TKIs, Gefitinib/Erlotinib, in lung cancer patients. The patient's mutation status is taken into consideration in molecular dynamics (MD) simulation. Each patient's unique mutation status was modeled considering MD simulation to extract molecular-level geometric features. Moreover, additional clinical features were incorporated into machine learning model for drug response prediction. The complete feature set includes demographic and clinical information (DCI), geometrical properties of the drug-target binding site, and the binding free energy of the drug-target complex from the MD simulation. PDRP incorporates an XGBoost classifier, which achieves state-of-the-art performance with 97.5% accuracy, 93% recall, 96.5% precision, and 94% F1-score, for a 4-class drug response prediction task. We found that modeling the geometry of the binding pocket combined with binding free energy is a good predictor for drug response. However, we observed that clinical information had a little impact on the performance of the model. The proposed model could be tested on other types of cancers. We believe PDRP will support the planning of effective treatment regimes based on clinical-genomic information. The source code and related files are available on GitHub at: https://github.com/rizwanqureshi123/PDRP/ .

PubMed Disclaimer

Figures

Figure 1
Figure 1
An EGFR-Gefitinib complex modeled with the L858R mutation. The drug molecule is indicated by the black square and the mutation by the red circle. The image was generated using PyMol.
Figure 2
Figure 2
Box plot of normalized values for energy, and geometrical features (left panel), and correlation among features (right panel).
Figure 3
Figure 3
(a) MD trajectories of EGFR and some mutants showing RMSD from the reference structure. As the values are below 5, the structures are reliable for further analysis. (b) Distribution of disease response classifications for 201 patients by the three most common mutations (L858R, L858R-T790M, del E746-750), and the others.
Figure 4
Figure 4
Disease response classification and survival time (months) by binding free energy (left panel) and disease response classification and survival time (years) by age of patient (right panel).
Figure 5
Figure 5
Contribution of geometrical, DCI, and energy-related features to the accuracy of the model.
Figure 6
Figure 6
Ablation study on geometric features.
Figure 7
Figure 7
Confusion matrix for testing dataset.
Figure 8
Figure 8
Classification performance on testing dataset.
Figure 9
Figure 9
Confusion matrices for the XGBoost model on gender-stratified patients.
Figure 10
Figure 10
The framework for predicting the drug response in lung cancer patients based on personal data, binding energy, and geometric features. Mutant structures are predicted by computational methods then molecular dynamics simulations extract energy and geometrical features. Machine learning classifiers then predict four classes of drug response from these features.
Figure 11
Figure 11
Atoms in convex and concave shapes at the surface curvature. The figure at the top shows a matched concave-convex pair, thus a strong interaction, while the figure at the bottom shows an unmatched pair and a weak interaction.
Figure 12
Figure 12
Geometric, energy and personal features and their distributions.

Similar articles

Cited by

References

    1. Siegel, R. L., Miller, K. D., Fuchs, H. E. & Jemal, A. Cancer statistics, 2022. CA Cancer J. Clin. (2022). - PubMed
    1. Gupta GP, Massagué J. Cancer metastasis: Building a framework. Cell. 2006;127:679–695. - PubMed
    1. Qureshi, R. et al. Computational methods for the analysis and prediction of egfr-mutated lung cancer drug resistance: Recent advances in drug design, challenges and future prospects. IEEE/ACM Trans. Comput. Biol. Bioinform. (2022). - PubMed
    1. Kawaguchi T, et al. Randomized phase iii trial of erlotinib versus docetaxel as second-or third-line therapy in patients with advanced non-small-cell lung cancer: Docetaxel and erlotinib lung cancer trial (delta) J. Clin. Oncol. 2014;32:1902–1908. - PubMed
    1. Pao W, et al. Egf receptor gene mutations are common in lung cancers from “never smokers” and are associated with sensitivity of tumors to gefitinib and erlotinib. Proc. Natl. Acad. Sci. 2004;101:13306–13311. - PMC - PubMed

Publication types

MeSH terms