Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 16;24(1):17.
doi: 10.1186/s12859-022-05104-z.

Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization

Affiliations

Identification of biomarkers predictive of metastasis development in early-stage colorectal cancer using network-based regularization

Carolina Peixoto et al. BMC Bioinformatics. .

Abstract

Colorectal cancer (CRC) is the third most common cancer and the second most deathly worldwide. It is a very heterogeneous disease that can develop via distinct pathways where metastasis is the primary cause of death. Therefore, it is crucial to understand the molecular mechanisms underlying metastasis. RNA-sequencing is an essential tool used for studying the transcriptional landscape. However, the high-dimensionality of gene expression data makes selecting novel metastatic biomarkers problematic. To distinguish early-stage CRC patients at risk of developing metastasis from those that are not, three types of binary classification approaches were used: (1) classification methods (decision trees, linear and radial kernel support vector machines, logistic regression, and random forest) using differentially expressed genes (DEGs) as input features; (2) regularized logistic regression based on the Elastic Net penalty and the proposed iTwiner-a network-based regularizer accounting for gene correlation information; and (3) classification methods based on the genes pre-selected using regularized logistic regression. Classifiers using the DEGs as features showed similar results, with random forest showing the highest accuracy. Using regularized logistic regression on the full dataset yielded no improvement in the methods' accuracy. Further classification using the pre-selected genes found by different penalty factors, instead of the DEGs, significantly improved the accuracy of the binary classifiers. Moreover, the use of network-based correlation information (iTwiner) for gene selection produced the best classification results and the identification of more stable and robust gene sets. Some are known to be tumor suppressor genes (OPCML-IT2), to be related to resistance to cancer therapies (RAC1P3), or to be involved in several cancer processes such as genome stability (XRCC6P2), tumor growth and metastasis (MIR602) and regulation of gene transcription (NME2P2). We show that the classification of CRC patients based on pre-selected features by regularized logistic regression is a valuable alternative to using DEGs, significantly increasing the models' predictive performance. Moreover, the use of correlation-based penalization for biomarker selection stands as a promising strategy for predicting patients' groups based on RNA-seq data.

Keywords: Biomarker selection; Classification; Colorectal cancer; Regularization; iTwiner.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Methodological procedure of the work presented here. The full dataset was divided into three smaller datasets. Survival analysis was performed to each dataset to evaluate how stages of the disease (II vs. III), sidedness of primary tumor site in colon (Right vs. Left), and class (P—primary patients that do not metastasize vs. PM—primary patients that metastasize) are related to risk of death. Afterwards, three different approaches to classify early-stage patients that metastasize were used: (1) Classifiers without regularization (DT – decision trees, svmL—linear support vector machine, svmR—radial support vector machine, LR—logistic regression and RF—random forest) applied to subset of genes that were found differentially expressed between two groups (P vs. PM); (2) Regularized logistic regression performed on the full dataset using two different penalization factors (EN—elastic net, and iTwiner); (3) Classifiers applied to genes pre-selected by regularized logistic regression. Model performance was compared using different types of measures (e.g., accuracy and misclassifications)
Fig. 2
Fig. 2
Survival curves for each dataset used, regarding different stages—II vs. III (top line), class—P vs. PM (mid line) and sidedness—Right vs. Left (bottom line)
Fig. 3
Fig. 3
Venn’s diagram comparing fifty DEGs found in each dataset, that exhibit the lowest p-values between the P and PM groups of patients
Fig. 4
Fig. 4
Venn’s diagram comparing the 50 genes that are selected more times by the regularization methods for each dataset tested. a Elastic net; b iTwiner
Fig. 5
Fig. 5
Boxplots comparing accuracy (Acc) obtained by the different approaches tested applied to each dataset. a Decision trees (DT); b linear support vector machine (svmL); c radial support vector machine (svmR); d random forest (RF); e logistic regression (LR)

Similar articles

Cited by

References

    1. Jung G, Hernández-Illán E, Moreira L, Balaguer F, Goel A. Epigenetics of colorectal cancer: biomarker and therapeutic potential. Nat Rev Gastroenterol Hepatol. 2020;17(2):111–130. - PMC - PubMed
    1. Markowitz SD, Bertagnolli MM. Molecular basis of colorectal cancer. N Engl J Med. 2009;361(25):2449–2460. - PMC - PubMed
    1. Phipps AI, Limburg PJ, Baron JA, Burnett-Hartman AN, Weisenberger DJ, Laird PW, Sinicrope FA, Rosty C, Buchanan DD, Potter JD, et al. Association between molecular subtypes of colorectal cancer and patient survival. Gastroenterology. 2015;148(1):77–87. - PMC - PubMed
    1. Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. Cancer evolution: mathematical models and computational inference. Syst Biol. 2015;64(1):1–25. - PMC - PubMed
    1. Arvelo F, Sojo F, Cotte C. Biology of colorectal cancer Ecancermedicalscience. 2015;9. - PMC - PubMed