Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan-Dec:20:15330338211058352.
doi: 10.1177/15330338211058352.

Colorectal Cancer Detected by Machine Learning Models Using Conventional Laboratory Test Data

Affiliations

Colorectal Cancer Detected by Machine Learning Models Using Conventional Laboratory Test Data

Hui Li et al. Technol Cancer Res Treat. 2021 Jan-Dec.

Abstract

Background: Current diagnostic methods for colorectal cancer (CRC) are colonoscopy and sigmoidoscopy, which are invasive and complex procedures with possible complications. This study aimed to determine models for CRC identification that involve minimally invasive, affordable, portable, and accurate screening variables. Methods: This was a retrospective study that used data from electronic medical records of patients with CRC and healthy individuals between July 2017 and June 2018. Laboratory data, including liver enzymes, lipid profiles, complete blood counts, and tumor biomarkers, were extracted from the electronic medical records. Five machine learning models (logistic regression, random forest, k-nearest neighbors, support vector machine [SVM], and naïve Bayes) were used to identify CRC. The performances were evaluated using the areas under the curve (AUCs), sensitivity, specificity, positive predictive values (PPV), and negative predictive values (NPV). Results: A total of 1164 electronic medical records (CRC patients: 582; healthy controls: 582) were included. The logistic regression model achieved the highest performance in identifying CRC (AUC: 0.865, sensitivity: 89.5%, specificity: 83.5%, PPV: 84.4%, NPV: 88.9%). The first four weighted features in the model were carcinoembryonic antigen (CEA), hemoglobin (HGB), lipoprotein (a) (Lp(a)), and high-density lipoprotein (HDL). A diagnostic model for CRC was established based on the four indicators, with an AUC of 0.849 (0.840-0.860) for identifying all CRC patients, and it performed best in discriminating patients with late colon cancer from healthy individuals with an AUC of 0.905 (0.889-0.929). Conclusions: The logistic regression model based on CEA, HGB, Lp(a), and HDL might be a powerful, noninvasive, and cost-effective method to identify CRC.

Keywords: clinical laboratory techniques; colorectal cancer; diagnosis; logistic regression; machine learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

Figure 1.
Figure 1.
Flowchart of the colorectal cancer (CRC) identification model.
Figure 2.
Figure 2.
Matrix of the Spearman correlation coefficients. All pairs of variables included in the models were tested using the Spearman correlation. For the variable pairs in which correlation coefficients >0.5, the one with the less weight coefficient in the principal component analysis (PCA) was deleted from feature collection.
Figure 3.
Figure 3.
The weight coefficients of the logistic regression model (the model with the highest accuracy) for colorectal cancer (CRC) diagnosis. The first four weighted features in the logistic regression model were carcinoembryonic antigen (CEA), hemoglobin (HGB), lipoprotein (a) (Lp(a)), and high-density lipoprotein (HDL).
Figure 4.
Figure 4.
Receiver operating characteristic (ROC) curve for colorectal cancer (CRC) diagnosis using logistic regression models: CEA alone, CEA + hemoglobin (HGB)  + Lp(a), CEA + HGB + Lp(a) + HDL, and CEA + HGB + Lp(a) + HDL + ALT.

References

    1. Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209-249. - PubMed
    1. Crockett SD, Nagtegaal ID. Terminology, molecular features, epidemiology, and management of serrated colorectal neoplasia. Gastroenterology. 2019;157(4):949-966. - PubMed
    1. He N, Song L, Kang Q, et al. The pathological features of colorectal cancer determine the detection performance on blood ctDNA. Technol Cancer Res Treat. 2018;17:1–9. - PMC - PubMed
    1. Bénard F, Barkun AN, Martel M, et al. Systematic review of colorectal cancer screening guidelines for average-risk adults: summarizing the current global recommendations. World J Gastroenterol. 2018;24(1):124-138. - PMC - PubMed
    1. Becker D, Grapendorf J, Greving H, et al. Perceived threat and internet Use predict intentions to Get bowel cancer screening (colonoscopy): longitudinal questionnaire study. J Med Internet Res. 2018;20(2):e46. - PMC - PubMed

Publication types

Substances