Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Apr;16(2):72-83.
doi: 10.1049/syb2.12041.

Identification and validation of a seven-gene prognostic marker in colon cancer based on single-cell transcriptome analysis

Affiliations

Identification and validation of a seven-gene prognostic marker in colon cancer based on single-cell transcriptome analysis

Yang Zhou et al. IET Syst Biol. 2022 Apr.

Abstract

Colon cancer (CC) is one of the most commonly diagnosed tumours worldwide. Single-cell RNA sequencing (scRNA-seq) can accurately reflect the heterogeneity within and between tumour cells and identify important genes associated with cancer development and growth. In this study, scRNA-seq was used to identify reliable prognostic biomarkers in CC. ScRNA-seq data of CC before and after 5-fluorouracil treatment were first downloaded from the Gene Expression Omnibus database. The data were pre-processed, and dimensionality reduction was performed using principal component analysis and t-distributed stochastic neighbour embedding algorithms. Additionally, the transcriptome data, somatic variant data, and clinical reports of patients with CC were obtained from The Cancer Genome Atlas database. Seven key genes were identified using Cox regression analysis and the least absolute shrinkage and selection operator method to establish signatures associated with CC prognoses. The identified signatures were validated on independent datasets, and somatic mutations and potential oncogenic pathways were further explored. Based on these features, gene signatures, and other clinical variables, a more effective predictive model nomogram for patients with CC was constructed, and a decision curve analysis was performed to assess the utility of the nomogram. A prognostic signature consisting of seven prognostic-related genes, including CAV2, EREG, NGFRAP1, WBSCR22, SPINT2, CCDC28A, and BCL10, was constructed and validated. The proficiency and credibility of the signature were verified in both internal and external datasets, and the results showed that the seven-gene signature could effectively predict the prognosis of patients with CC under various clinical conditions. A nomogram was then constructed based on features such as the RiskScore, patients' age, neoplasm stage, and tumor (T), nodes (N), and metastases (M) classification, and the nomogram had good clinical utility. Higher RiskScores were associated with a higher tumour mutational burden, which was confirmed to be a prognostic risk factor. Gene set enrichment analysis showed that high-score groups were enriched in 'cytoplasmic DNA sensing', 'Extracellular matrix receptor interactions', and 'focal adhesion', and low-score groups were enriched in 'natural killer cell-mediated cytotoxicity', and 'T-cell receptor signalling pathways', among other pathways. A robust seven-gene marker for CC was identified based on scRNA-seq data and was validated in multiple independent cohort studies. These findings provide a new potential marker to predict the prognosis of patients with CC.

Keywords: colon cancer (CC); metastasis-associated genes; progression; single-cell RNA sequencing (scRNA-seq); the cancer genome atlas (TCGA); tumour mutational burden (TMB).

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
(a) Single‐cell RNA sequencing data were subjected to quality control, where low quality cells and lowly expressed genes were removed; (b) The number of principal components (PCs) for principal component analysis based on the p‐value; (c) Heat map of clustered feature genes for each subpopulation; (d) Clustering map of t‐distributed stochastic neighbour embedding (t‐SNE) dimensionality reduction, where all colon cancer single‐cell data were clustered into eight categories; (e) t‐SNE distribution of patients with colon cancer; (f) Distribution of patients with or without treatment
FIGURE 2
FIGURE 2
(a and b) Twelve prognostic genes were identified in the The Cancer Genome Atlas training cohort based on the least absolute shrinkage and selection operator approach using the ‘glmnet’ package in R (best cut‐off value, −4.6); (c) The expression of these seven genes was analysed by multivariate Cox analysis in each subpopulation
FIGURE 3
FIGURE 3
(a) Receiver operating characteristic (ROC) analysis of the risk model in the training set; (b) Survival analysis of the risk model in the training set; (c) Distribution of RiskScore and survival status in the training set; (d) ROC analysis of the risk model in the entire The Cancer Genome Atlas (TCGA) cohort; (e) Survival analysis of the risk model in the entire TCGA cohort; (f) Distribution of RiskScore and survival status in the entire TCGA cohort; (g) ROC analysis of the risk model in the GSE17536 validation set; (h) Survival analysis of the risk model in the GSE17536 validation set; (i) Distribution of RiskScore and survival status in the GSE17536 validation set
FIGURE 4
FIGURE 4
(a) Box plots showing the distribution of RiskScores in the entire TCGA‐COAD cohort according to different tumour stages; (b) Box plots showing the distribution of RiskScores in the entire TCGA‐COAD cohort according to different T stages; (c) Box plots showing the distribution of RiskScores in the entire TCGA‐COAD cohort according to different M stages; (d) Box plots showing the distribution of RiskScores in the full set of TCGA‐COAD according to different N stages; (e) Receiver operating characteristic (ROC) curves for 1‐, 3‐, and 5‐year survival predicted by the risk model; (f) Survival analysis between high and low‐risk groups. TCGA, The Cancer Genome Atlas
FIGURE 5
FIGURE 5
(a) Forest plot of univariate Cox analysis; (b) Forest plot of multivariate Cox analysis; (c) Nomogram of the prediction model; (d) 1‐, 3‐, and 5‐year calibration curves of the nomogram; (e) Receiver operating characteristic (ROC) curves for 1‐, 3‐, and 5‐year survival predicted by the risk model; (f) Survival analysis between high‐ and low‐risk groups predicted by the nomogram
FIGURE 6
FIGURE 6
(a) The tumour mutational burden (TMB) in the high‐risk group predicted by the risk model; (b) The TMB in the low‐risk group predicted by the risk model; (c) Results of gene set enrichment analysis

References

    1. Siegel, R.L. , et al.: Cancer statistics, 2021. CA Cancer J. Clin. 71(1), 7–33 (2021). 10.3322/caac.21654 - DOI - PubMed
    1. Sung, H. , et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71(3), 209–249 (2021) - PubMed
    1. Siegel, R.L. , et al.: Colorectal cancer statistics, 2020. CA Cancer J. Clin. 70(3), 145–164 (2020) - PubMed
    1. Cheng, L. , et al.: Trends in colorectal cancer incidence by anatomic site and disease stage in the United States from 1976 to 2005. Am. J. Clin. Oncol. 34(6), 573–580 (2011) - PubMed
    1. Longley, D.B. , Harkin, D.P. , Johnston, P.G. : 5‐fluorouracil: mechanisms of action and clinical strategies. Nat. Rev. Cancer. 3(5), 330–338 (2003) - PubMed

Publication types

Substances