Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec;25(4):254-258.
doi: 10.5455/aim.2017.25.254-258.

Comparison of Basic and Ensemble Data Mining Methods in Predicting 5-Year Survival of Colorectal Cancer Patients

Affiliations

Comparison of Basic and Ensemble Data Mining Methods in Predicting 5-Year Survival of Colorectal Cancer Patients

Mohamad Amin Pourhoseingholi et al. Acta Inform Med. 2017 Dec.

Abstract

Introduction: Colorectal cancer (CRC) is one of the most common malignancies and cause of cancer mortality worldwide. Given the importance of predicting the survival of CRC patients and the growing use of data mining methods, this study aims to compare the performance of models for predicting 5-year survival of CRC patients using variety of basic and ensemble data mining methods.

Methods: The CRC dataset from The Shahid Beheshti University of Medical Sciences Research Center for Gastroenterology and Liver Diseases were used for prediction and comparative study of the base and ensemble data mining techniques. Feature selection methods were used to select predictor attributes for classification. The WEKA toolkit and MedCalc software were respectively utilized for creating and comparing the models.

Results: The obtained results showed that the predictive performance of developed models was altogether high (all greater than 90%). Overall, the performance of ensemble models was higher than that of basic classifiers and the best result achieved by ensemble voting model in terms of area under the ROC curve (AUC= 0.96).

Conclusion: AUC Comparison of models showed that the ensemble voting method significantly outperformed all models except for two methods of Random Forest (RF) and Bayesian Network (BN) considered the overlapping 95% confidence intervals. This result may indicate high predictive power of these two methods along with ensemble voting for predicting 5-year survival of CRC patients.

Keywords: AUC; colorectal cancer; data mining; machine learning; survival.

PubMed Disclaimer

Conflict of interest statement

• Conflict of interest: none declared.

Figures

Figure 1
Figure 1
Prediction performance of developed models in terms of AUC

References

    1. Stewart B, Wild C. World Cancer Report 2014. Lyon: International Agency for Research on Cancer/World Health Organization; 2014.
    1. Weiser MR, Gonen M, Chou JF, Kattan MW, Schrag D. Predicting survival after curative colectomy for cancer: individualizing colon cancer staging. Journal of Clinical Oncology. 2011;29(36):4796–802. - PMC - PubMed
    1. Edge SB, Compton CC. The American Joint Committee on Cancer: the 7th edition of the AJCC cancer staging manual and the future of TNM. Annals of surgical oncology. 2010;17(6):1471–4. - PubMed
    1. Roncucci L, Fante R, Losi L, Di Gregorio C, Micheli A, Benatti P, et al. Survival for colon and rectal cancer in a population-based cancer registry. European Journal of Cancer. 1996;32(2):295–302. - PubMed
    1. Gao P, Zhou X, Wang ZN, Song YX, Tong LL, Xu YY, et al. Which is a more accurate predictor in colorectal survival analysis?. Nine data mining algorithms vs. the TNM staging system. PLoS One. 2012;7(7):e42015. - PMC - PubMed

LinkOut - more resources