Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 13;22(1):156.
doi: 10.1186/s12957-024-03435-0.

Decoding temporal heterogeneity in NSCLC through machine learning and prognostic model construction

Affiliations

Decoding temporal heterogeneity in NSCLC through machine learning and prognostic model construction

Junpeng Cheng et al. World J Surg Oncol. .

Abstract

Background: Non-small cell lung cancer (NSCLC) is a prevalent and heterogeneous disease with significant genomic variations between the early and advanced stages. The identification of key genes and pathways driving NSCLC tumor progression is critical for improving the diagnosis and treatment outcomes of this disease.

Methods: In this study, we conducted single-cell transcriptome analysis on 93,406 cells from 22 NSCLC patients to characterize malignant NSCLC cancer cells. Utilizing cNMF, we classified these cells into distinct modules, thus identifying the diverse molecular profiles within NSCLC. Through pseudotime analysis, we delineated temporal gene expression changes during NSCLC evolution, thus demonstrating genes associated with disease progression. Using the XGBoost model, we assessed the significance of these genes in the pseudotime trajectory. Our findings were validated by using transcriptome sequencing data from The Cancer Genome Atlas (TCGA), supplemented via LASSO regression to refine the selection of characteristic genes. Subsequently, we established a risk score model based on these genes, thus providing a potential tool for cancer risk assessment and personalized treatment strategies.

Results: We used cNMF to classify malignant NSCLC cells into three functional modules, including the metabolic reprogramming module, cell cycle module, and cell stemness module, which can be used for the functional classification of malignant tumor cells in NSCLC. These findings also indicate that metabolism, the cell cycle, and tumor stemness play important driving roles in the malignant evolution of NSCLC. We integrated cNMF and XGBoost to select marker genes that are indicative of both early and advanced NSCLC stages. The expression of genes such as CHCHD2, GAPDH, and CD24 was strongly correlated with the malignant evolution of NSCLC at the single-cell data level. These genes have been validated via histological data. The risk score model that we established (represented by eight genes) was ultimately validated with GEO data.

Conclusion: In summary, our study contributes to the identification of temporal heterogeneous biomarkers in NSCLC, thus offering insights into disease progression mechanisms and potential therapeutic targets. The developed workflow demonstrates promise for future applications in clinical practice.

Keywords: Machine learning; Non-small cell lung cancer; Temporal heterogeneity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

The authors have no conflicts of interest to declare.

Figures

Fig. 1
Fig. 1
The flowchart for this study
Fig. 2
Fig. 2
Identification of cell subsets (A) UMAP plot of all cells from 22 patients, colored by major cell types. (B) UMAP plot of all cells from 22 patients, colored by patients. (C) Violin plots showing the expression level of known cell-type-specific markers to demonstrate the identity of each cluster
Fig. 3
Fig. 3
The CNV profile analysis distinguishes tumor cells. Mapping chromosome amplification (red) and deletion (blue) to each chromosome position in malignant tumor cells. (A) InferCNV of GSE131907 (10X sequencing platform) (B) InferCNV of GSE127465 and GSE136246 (inDrop sequencing platform)
Fig. 4
Fig. 4
Pseudotime trajectory inferred by Monocle2 (A) Simulation of the development trajectory of malignant cells, colored by development stage. (B) Simulation of the development trajectory of malignant cells, colored by cell types. (C) Heatmap showing expression of representative identified genes across single cells. The color key from blue to red indicates relative expression levels from low to high
Fig. 5
Fig. 5
Histogram of GO enrichment analysis for differential genes across single cells. (A) The enrichment pathway of Cluster (1) (B) The enrichment pathway of Cluster (2) (C) The enrichment pathway of Cluster (3) (D) The enrichment pathway of Cluster 4
Fig. 6
Fig. 6
Interaction plot of tumor cells and intercellular communication networks. (A) The circle plot shows the inferred intercellular communication network for all cell types. (B) The circle plot shows the communication network between advanced NSCLC cells and other cells. (C) The circle plot shows the communication network between early NSCLC cells and other cells. (D) The heat map of the communication intensity between various cells
Fig. 7
Fig. 7
Bubble diagram showing the top receptor-ligand pairs in early and advanced NSCLC cells
Fig. 8
Fig. 8
(A) K-selection plot of dataset1. (B) K-selection plot of dataset2. (C) K-selection plot of dataset3. (D) Pearson correlation matrix for selected programs. (E) Heatmap showing correlation of programs derived from cNMF analysis of single cell dataset1 (F) Heatmap showing correlation of programs derived from cNMF analysis of single cell dataset2. (G) Heatmap showing correlation of programs derived from cNMF analysis of single cell dataset3
Fig. 9
Fig. 9
Immunohistochemical staining analysis of CHCHD2, CEACAM5, GAPDH, and CD24 in normal lung and lung cancer tissues
Fig. 10
Fig. 10
Identification of prognostic biomarkers related to the temporal heterogeneity of NSCLC. (A) Determination of the number of factors by the LASSO algorithm. (B) The genes obtained from LASSO regression downscaling
Fig. 11
Fig. 11
(A) The distribution of risk score and survival status and the heatmap of 8 genes in the TCGA_LUAD cohort. (B) Kaplan–Meier curve depicts the OS difference between highrisk and lowrisk groups in TCGA. (C) Kaplan–Meier curve depicts the OS difference between highrisk and lowrisk groups in GSE30219.

Similar articles

Cited by

References

    1. Zhang H, Jiang H, Zhu L, Li J, Ma S. Cancer-associated fibroblasts in non-small cell lung cancer: recent advances and future perspectives. Cancer Lett. 2021;514:38–47. doi: 10.1016/j.canlet.2021.05.009. - DOI - PubMed
    1. Rotow J, Bivona TG. Understanding and targeting resistance mechanisms in NSCLC. Nat Rev Cancer. 2017;17:637–58. doi: 10.1038/nrc.2017.84. - DOI - PubMed
    1. Jamal-Hanjani M, Wilson GA, McGranahan N, Birkbak NJ, Watkins TBK, Veeriah S, Shafi S, Johnson DH, Mitter R, Rosenthal R, et al. Tracking the evolution of Non-small-cell Lung Cancer. N Engl J Med. 2017;376:2109–21. doi: 10.1056/NEJMoa1616288. - DOI - PubMed
    1. de Bruin EC, McGranahan N, Mitter R, Salm M, Wedge DC, Yates L, Jamal-Hanjani M, Shafi S, Murugaesu N, Rowan AJ, et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science. 2014;346:251–6. doi: 10.1126/science.1253462. - DOI - PMC - PubMed
    1. Frankell AM, Dietzen M, Al Bakir M, Lim EL, Karasaki T, Ward S, Veeriah S, Colliver E, Huebner A, Bunkum A, et al. The evolution of lung cancer and impact of subclonal selection in TRACERx. Nature. 2023;616:525–33. doi: 10.1038/s41586-023-05783-5. - DOI - PMC - PubMed

Substances