Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 22;15(1):33.
doi: 10.1186/s12920-022-01184-1.

Machine learning and bioinformatics analysis revealed classification and potential treatment strategy in stage 3-4 NSCLC patients

Affiliations

Machine learning and bioinformatics analysis revealed classification and potential treatment strategy in stage 3-4 NSCLC patients

Chang Li et al. BMC Med Genomics. .

Abstract

Background: Precision medicine has increased the accuracy of cancer diagnosis and treatment, especially in the era of cancer immunotherapy. Despite recent advances in cancer immunotherapy, the overall survival rate of advanced NSCLC patients remains low. A better classification in advanced NSCLC is important for developing more effective treatments.

Method: The calculation of abundances of tumor-infiltrating immune cells (TIICs) was conducted using Cell-type Identification By Estimating Relative Subsets Of RNA Transcripts (CIBERSORT), xCell (xCELL), Tumor IMmune Estimation Resource (TIMER), Estimate the Proportion of Immune and Cancer cells (EPIC), and Microenvironment Cell Populations-counter (MCP-counter). K-means clustering was used to classify patients, and four machine learning methods (SVM, Randomforest, Adaboost, Xgboost) were used to build the classifiers. Multi-omics datasets (including transcriptomics, DNA methylation, copy number alterations, miRNA profile) and ICI immunotherapy treatment cohorts were obtained from various databases. The drug sensitivity data were derived from PRISM and CTRP databases.

Results: In this study, patients with stage 3-4 NSCLC were divided into three clusters according to the abundance of TIICs, and we established classifiers to distinguish these clusters based on different machine learning algorithms (including SVM, RF, Xgboost, and Adaboost). Patients in cluster-2 were found to have a survival advantage and might have a favorable response to immunotherapy. We then constructed an immune-related Poor Prognosis Signature which could successfully predict the advanced NSCLC patient survival, and through epigenetic analysis, we found 3 key molecules (HSPA8, CREB1, RAP1A) which might serve as potential therapeutic targets in cluster-1. In the end, after screening of drug sensitivity data derived from CTRP and PRISM databases, we identified several compounds which might serve as medication for different clusters.

Conclusions: Our study has not only depicted the landscape of different clusters of stage 3-4 NSCLC but presented a treatment strategy for patients with advanced NSCLC.

Keywords: Cancer immunotherapy; Drug sensitivity; Immunophenotypes; Machine learning; Multiomics; Signature; Treatment strategy.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Unsupervised clustering of TIICS in stage 3–4 NSCLC. A. Top: Consensus clustering of the pairwise correlation of TIICs. Three modules associations were indicated in the heatmap. Middle: Five representative immune cells (T cells CD8, M1.Macrophage, Monocyte, M2.Macrophage, M0.Macrophage) from each module, with heatmap indicating the abundance (Dark colour represents high expression level, while the light colour represents the low expression level). Bottom: Distribution of the five selected TIICs within the three clusters (row), with dashed line indicating the median. B Radar graph indicates the ESTIMATE scores and four immunotherapy-related signature scores in three clusters. Line color represents the three clusters: Red for cluster-1, green for cluster-2, and blue for cluster-3
Fig. 2
Fig. 2
TME characteristics in each clusters. A Kaplan–Meier curve displaysdifferences of overall survival among three clusters. Log-rank statistic was conducted to test statistical significance. B Comparison of GSVA score of CD.Sig, IFNG.sig, EIGS, 12-chemokine signature among different clusters. Kruskal–Wallis statistic was conducted to test statistical significance. C 22 TIICs abundance among three clusters were shown in the box plot. ***, P < 0.0001; **, P < 0.001; *, 0.001 < P < 0.01
Fig. 3
Fig. 3
Construction of immune-related poor prognosis signature. A The distribution of PPS score, OS, and expression patterns of genes involved in the signature. B Kaplan–Meier curve of OS among PPS-high and PPS-low group patients. Log-rank statistic was conducted to test statistical significance. C Performance assessment of the PPS by AUC. ROC analysis revealed the AUC was 0.83 at 12 months, 0.894 at 36 months, and 0.869 at 60 months
Fig. 4
Fig. 4
The performance and distribution of PPS in IMvigor210 cohort. A Kaplan–Meier curve of patients in PPS-high and PPS-low groups. Log-rank test statistic was conducted to test statistical significance. B The PPS distribution of patients in treatment-benefit and treatment-non-benefit groups. Kruskal–Wallis statistic was conducted to test statistical significance. C The PPS distribution of patients in different immune phenotype groups. Kruskal–Wallis statistic was conducted to test statistical significance
Fig. 5
Fig. 5
The landscape of mutation status among different clusters. AC Top 30 genes with the highest mutation frequencies in cluster-1 (A), cluster-2 (B) and cluster-3 (C). D Tumor mutation burden (TMB) distribution in different clusters. Kruskal–Wallis statistic was conducted to test statistical significance
Fig. 6
Fig. 6
The differences in epigenetic regulation in different clusters. A Comparison of the copy number alterations among different clusters. Gistic scores were assessed by GISTIC 2.0 with red for amplification and blue for deletion. B In the comparison between cluster-1 and cluster-2, venn diagram summarizes the DEmiRNA-mRNA links predicted by miRtarbase, miRDB and Targetscan databases. C In the comparison between cluster-3 and cluster-2, venn diagram summarizes the DEmiRNA-mRNA links predicted by miRtarbase, miRDB and Targetscan databases
Fig. 7
Fig. 7
Identification of potential agents in each cluster. Differential drug response analysis of compounds identified in cluster-1 (A), cluster-2 (B), and cluster-3 (C). Note that higher estimated AUC values imply lower drug sensitivity. ***, P < 0.001; **, P < 0.01; *, P < 0.05

References

    1. Duma N, Santana-Davila R, Molina JR. Non-small cell lung cancer: epidemiology, screening, diagnosis, and treatment. Mayo Clin Proc. 2019;94(8):1623–1640. doi: 10.1016/j.mayocp.2019.01.013. - DOI - PubMed
    1. Leonetti A, Wever B, Mazzaschi G, Assaraf YG, Rolfo C, Quaini F, et al. Molecular basis and rationale for combining immune checkpoint inhibitors with chemotherapy in non-small cell lung cancer. Drug Resist. 2019;46:100644. doi: 10.1016/j.drup.2019.100644. - DOI - PubMed
    1. Reck M, Rabe KF. Precision diagnosis and treatment for advanced non-small-cell lung cancer. N Engl J Med. 2017;377(9):849–861. doi: 10.1056/NEJMra1703413. - DOI - PubMed
    1. Zappa C, Mousa SA. Non-small cell lung cancer: current treatment and future advances. Transl Lung Cancer Res. 2016;5(3):288–300. doi: 10.21037/tlcr.2016.06.07. - DOI - PMC - PubMed
    1. Mariniello A, Novello S, Scagliotti GV, Ramalingam SS. Double immune checkpoint blockade in advanced NSCLC. Crit Rev Oncol Hematol. 2020;152:102980. doi: 10.1016/j.critrevonc.2020.102980. - DOI - PubMed

Publication types