Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr;57(4):901-912.
doi: 10.1007/s11517-018-1930-0. Epub 2018 Nov 26.

A reliable method for colorectal cancer prediction based on feature selection and support vector machine

Affiliations

A reliable method for colorectal cancer prediction based on feature selection and support vector machine

Dandan Zhao et al. Med Biol Eng Comput. 2019 Apr.

Abstract

Colorectal cancer (CRC) is a common cancer responsible for approximately 600,000 deaths per year worldwide. Thus, it is very important to find the related factors and detect the cancer accurately. However, timely and accurate prediction of the disease is challenging. In this study, we build an integrated model based on logistic regression (LR) and support vector machine (SVM) to classify the CRC into cancer and normal samples. From various factors, human location, age, gender, BMI, and cancer tumor type, tumor grade, and DNA, of the cancer, we select the most significant factors (p < 0.05) using logistic regression as main features, and with these features, a grid-search SVM model is designed using different kernel types (Linear, radial basis function (RBF), Sigmoid, and Polynomial). The result of the logistic regression indicates that the Firmicutes (AUC 0.918), Bacteroidetes (AUC 0.856), body mass index (BMI) (AUC 0.777), and age (AUC 0.710) and their combined factors (AUC 0.942) are effective for CRC detection. And the best kernel type is RBF, which achieves an accuracy of 90.1% when k = 5, and 91.2% when k = 10. This study provides a new method for colorectal cancer prediction based on independent risky factors. Graphical abstract Flow chart depicting the method adopted in the study. LR (logistic regression) and ROC curve are used to select independent features as input of SVM. SVM kernel selection aims to find the best kernel function for classification by comparing Linear, RBF, Sigmoid, and Polynomial kernel types of SVM, and the result shows the best kernel is RBF. Classification performance of LR + RF, LR + NB, LR + KNN, and LR + ANNs models are compared with LR + SVM. After these steps, the cancer and healthy individuals can be classified, and the best model is selected.

Keywords: Colorectal cancer; Logistic regression; Microbiome; Support vector machine.

PubMed Disclaimer

Similar articles

Cited by

References

    1. FEBS Lett. 2003 Dec 4;555(2):358-62 - PubMed
    1. Bioinformatics. 2004 Oct 12;20(15):2429-37 - PubMed
    1. Gastroenterology. 2010 Jun;138(6):2029-2043.e10 - PubMed
    1. Nat Rev Immunol. 2011 Jan;11(1):9-20 - PubMed
    1. Surg Clin North Am. 2011 Feb;91(1):127-39 - PubMed

LinkOut - more resources