A reliable method for colorectal cancer prediction based on feature selection and support vector machine
- PMID: 30478811
- DOI: 10.1007/s11517-018-1930-0
A reliable method for colorectal cancer prediction based on feature selection and support vector machine
Abstract
Colorectal cancer (CRC) is a common cancer responsible for approximately 600,000 deaths per year worldwide. Thus, it is very important to find the related factors and detect the cancer accurately. However, timely and accurate prediction of the disease is challenging. In this study, we build an integrated model based on logistic regression (LR) and support vector machine (SVM) to classify the CRC into cancer and normal samples. From various factors, human location, age, gender, BMI, and cancer tumor type, tumor grade, and DNA, of the cancer, we select the most significant factors (p < 0.05) using logistic regression as main features, and with these features, a grid-search SVM model is designed using different kernel types (Linear, radial basis function (RBF), Sigmoid, and Polynomial). The result of the logistic regression indicates that the Firmicutes (AUC 0.918), Bacteroidetes (AUC 0.856), body mass index (BMI) (AUC 0.777), and age (AUC 0.710) and their combined factors (AUC 0.942) are effective for CRC detection. And the best kernel type is RBF, which achieves an accuracy of 90.1% when k = 5, and 91.2% when k = 10. This study provides a new method for colorectal cancer prediction based on independent risky factors. Graphical abstract Flow chart depicting the method adopted in the study. LR (logistic regression) and ROC curve are used to select independent features as input of SVM. SVM kernel selection aims to find the best kernel function for classification by comparing Linear, RBF, Sigmoid, and Polynomial kernel types of SVM, and the result shows the best kernel is RBF. Classification performance of LR + RF, LR + NB, LR + KNN, and LR + ANNs models are compared with LR + SVM. After these steps, the cancer and healthy individuals can be classified, and the best model is selected.
Keywords: Colorectal cancer; Logistic regression; Microbiome; Support vector machine.
Similar articles
-
Seminal quality prediction using data mining methods.Technol Health Care. 2014;22(4):531-45. doi: 10.3233/THC-140816. Technol Health Care. 2014. PMID: 24898862
-
Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis.J Biomed Inform. 2019 Apr;92:103124. doi: 10.1016/j.jbi.2019.103124. Epub 2019 Feb 20. J Biomed Inform. 2019. PMID: 30796977
-
Computer-Aided Detection of Incidental Lumbar Spine Fractures from Routine Dual-Energy X-Ray Absorptiometry (DEXA) Studies Using a Support Vector Machine (SVM) Classifier.J Digit Imaging. 2020 Feb;33(1):204-210. doi: 10.1007/s10278-019-00224-0. J Digit Imaging. 2020. PMID: 31062114 Free PMC article.
-
Machine learning applications to clinical decision support in neurosurgery: an artificial intelligence augmented systematic review.Neurosurg Rev. 2020 Oct;43(5):1235-1253. doi: 10.1007/s10143-019-01163-8. Epub 2019 Aug 17. Neurosurg Rev. 2020. PMID: 31422572
-
ENZPRED-enzymatic protein class predicting by machine learning.Curr Top Med Chem. 2013;13(14):1674-80. doi: 10.2174/15680266113139990118. Curr Top Med Chem. 2013. PMID: 23889047 Review.
Cited by
-
Machine learning-based colorectal cancer prediction using global dietary data.BMC Cancer. 2023 Feb 10;23(1):144. doi: 10.1186/s12885-023-10587-x. BMC Cancer. 2023. PMID: 36765299 Free PMC article.
-
A Model Using Support Vector Machines Recursive Feature Elimination (SVM-RFE) Algorithm to Classify Whether COPD Patients Have Been Continuously Managed According to GOLD Guidelines.Int J Chron Obstruct Pulmon Dis. 2020 Nov 4;15:2779-2786. doi: 10.2147/COPD.S271237. eCollection 2020. Int J Chron Obstruct Pulmon Dis. 2020. PMID: 33177815 Free PMC article.
-
A review of machine learning methods for cancer characterization from microbiome data.NPJ Precis Oncol. 2024 May 30;8(1):123. doi: 10.1038/s41698-024-00617-7. NPJ Precis Oncol. 2024. PMID: 38816569 Free PMC article. Review.
-
The development of an efficient artificial intelligence-based classification approach for colorectal cancer response to radiochemotherapy: deep learning vs. machine learning.Sci Rep. 2025 Jan 2;15(1):62. doi: 10.1038/s41598-024-84023-w. Sci Rep. 2025. PMID: 39748016 Free PMC article.
-
Machine learning to evaluate the effects of non-clinical social determinant features in predicting colorectal Cancer mortality in a medically underserved Appalachian population.Sci Rep. 2025 Jul 16;15(1):25781. doi: 10.1038/s41598-025-11074-y. Sci Rep. 2025. PMID: 40670552 Free PMC article.
References
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical