Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 28:12:e18347.
doi: 10.7717/peerj.18347. eCollection 2024.

COADREADx: A comprehensive algorithmic dissection of colorectal cancer unravels salient biomarkers and actionable insights into its discrete progression

Affiliations

COADREADx: A comprehensive algorithmic dissection of colorectal cancer unravels salient biomarkers and actionable insights into its discrete progression

Ashok Palaniappan et al. PeerJ. .

Abstract

Background: Colorectal cancer is a common condition with an uncommon burden of disease, heterogeneity in manifestation, and no definitive treatment in the advanced stages. Renewed efforts to unravel the genetic drivers of colorectal cancer progression are paramount. Early-stage detection contributes to the success of cancer therapy and increases the likelihood of a favorable prognosis. Here, we have executed a comprehensive computational workflow aimed at uncovering the discrete stagewise genomic drivers of colorectal cancer progression.

Methods: Using the TCGA COADREAD expression data and clinical metadata, we constructed stage-specific linear models as well as contrast models to identify stage-salient differentially expressed genes. Stage-salient differentially expressed genes with a significant monotone trend of expression across the stages were identified as progression-significant biomarkers. The stage-salient genes were benchmarked using normals-augmented dataset, and cross-referenced with existing knowledge. The candidate biomarkers were used to construct the feature space for learning an optimal model for the digital screening of early-stage colorectal cancers. The candidate biomarkers were also examined for constructing a prognostic model based on survival analysis.

Results: Among the biomarkers identified are: CRLF1, CALB2, STAC2, UCHL1, KCNG1 (stage-I salient), KLHL34, LPHN3, GREM2, ADCY5, PLAC2, DMRT3 (stage-II salient), PIGR, HABP2, SLC26A9 (stage-III salient), GABRD, DKK1, DLX3, CST6, HOTAIR (stage-IV salient), and CDH3, KRT80, AADACL2, OTOP2, FAM135B, HSP90AB1 (top linear model genes). In particular the study yielded 31 genes that are progression-significant such as ESM1, DKK1, SPDYC, IGFBP1, BIRC7, NKD1, CXCL13, VGLL1, PLAC1, SPERT, UPK2, and interestingly three members of the LY6G6 family. Significant monotonic linear model genes included HIGD1A, ACADS, PEX26, and SPIB. A feature space of just seven biomarkers, namely ESM1, DHRS7C, OTOP3, AADACL2, LPHN3, GABRD, and LPAR1, was sufficient to optimize a RandomForest model that achieved > 98% balanced accuracy (and performant recall) of cancer vs. normal on external validation. Design of an optimal multivariate model based on survival analysis yielded a prognostic panel of three stage-IV salient genes, namely HOTAIR, GABRD, and DKK1. Based on the above sparse signatures, we have developed COADREADx, a web-server for potentially assisting colorectal cancer screening and patient risk stratification. COADREADx provides uncertainty measures for its predictions and needs clinical validation. It has been deployed for experimental non-commercial use at: https://apalanialab.shinyapps.io/coadreadx/.

Keywords: Colorectal cancer screening; Differentially expressed genes; Monotonically expressed genes; Network analysis; Progression-significant genes; Random forest; Risk stratification; Stage-salient genes; Stagewise linear models; Web-server.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Study design for the dissection of discrete stage-wise progression of colorectal cancer.
The identified candidate biomarkers could be used to train machine learning classifiers for the screening and prognosis of colorectal cancers. Figure created with Biorender.com.
Figure 2
Figure 2. Design matrices used for (A) linear modeling ; and (B) between-stages contrasts.
C: Control, S1: Stage-I, S2: Stage-II, S3: Stage-III, S4: Stage-IV.
Figure 3
Figure 3. Expression trends of the top 9 DEGs from the linear modeling.
Row-1: CDH3, KRT80, OTOP2; Row-2: AADACL2, ETV4, ESM1; Row-3: DHRS7C, OTOP3, JUB. In each plot, expression trends in the control samples are shown first, followed by stage-wise trends in progressive fashion. It can be observed that some genes are downregulated to near-zero expression as CRC progresses (notably OTOP2, OTOP3, AADACL2 and DHRS7C).
Figure 4
Figure 4. Illustration of dichotomy in expression trends of stage-salient genes (namely, consistent differential upregulation and consistent differential downregulation).
Each stage is represented by one upregulated gene (column 1) and one downregulated gene (column 2). (A) Stage-I: ADAMTSL1 & ARNTL2; (B) Stage-II: KLHL34 & CEP72; (C) Stage-III: ENPP3 & FAM40B; (D) Stage-IV: ADAM6 & ADAM1. Note that the expression of ADAM6 is provided in log_10 units.
Figure 5
Figure 5. Visualizing samples in principal components space.
(A) Top 100 genes of the linear model; and (B) 100 randomly chosen genes. Only the top two principal components are used.
Figure 6
Figure 6. Distribution of genes based on stage-specificity.
Of the 2,242 DEGs, 1,379 appear significant in all the stages. It can be clearly seen that the early-stages (stages 1 and 2) share fewer DEGs with the late-stages (stages 3 and 4), flagging extra factors necessary for cancer progression to metastasis.
Figure 7
Figure 7. Heatmap of the lfc (with respect to control samples) of top 40 genes.
Stage-salient genes express maximal salience in one of the stages. It is striking that all the ten stage-IV salient genes show monotonic progressive upregulation (for e.g., GABRD). The gradient of expression is shown in the color key.
Figure 8
Figure 8. Survival analysis of prognostically significant stage-salient genes.
Univariate Cox regression analysis of (A) HOTAIR , (B) GABRD, (C) DKK1; and (D) construction of optimal multivariate panel comprising the above biomarkers. Over-expression of the prognostic biomarkers has a significant effect on the survival probabilities (P < 0.05), and elevates the patient risk. Red - high-risk, blue - low-risk; colored dashed lines represent corresponding 95% confidence intervals.
Figure 9
Figure 9. Expression trends of candidate hub-driver genes.
(A) GRIN2A and (B) EIF2B5.
Figure 10
Figure 10. Network reconstruction of perturbed pathways with monotonic expression enrichment based on the seed set of stage-salient MEGs in TCGA COADREAD.
Evidence from known interactions (curated databases, experimentally determined) or predicted from gene neighborhood, gene fusions or gene co-occurrence were used in identifying edges. Colored nodes indicate query proteins and first shell of interactors, whereas white nodes indicate second shell of interactors.

References

    1. Abudoureyimu A, Maimaiti R, Magaoweiya S, Bagedati D, Wen H. Identification of long non-coding RNA expression profile in tissue and serum of papillary thyroid carcinoma. International Journal of Clinical and Experimental Pathology. 2016;9(2):1177–1185.
    1. Amin MB, Greene FL, Edge SB, Compton CC, Gershenwald JE, Brookland RK, Meyer L, Gress DM, Byrd DR, Winchester DP. The eighth edition AJCC cancer staging manual: continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging. CA: A Cancer Journal for Clinicians. 2017;67(2):93–99. - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Aytes A, Mitrofanova A, Kinkade CW, Lefebvre C, Lei M, Phelan V, LeKaye HC, Koutcher JA, Cardiff RD, Califano A, Shen MM, Abate-Shen C. ETV4 promotes metastasis in response to activation of PI3-kinase and Ras signaling in a mouse model of advanced prostate cancer. Proceedings of the National Academy of Sciences of the United States of America. 2013;110(37):E3506–E3515. - PMC - PubMed
    1. Barret T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva A. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Research. 2013;41(D1):D991–D995. doi: 10.1093/nar/gks1193. - DOI - PMC - PubMed

Substances