Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 6:15:75-85.
doi: 10.1016/j.csbj.2016.11.004. eCollection 2017.

A Hybrid Computer-aided-diagnosis System for Prediction of Breast Cancer Recurrence (HPBCR) Using Optimized Ensemble Learning

Affiliations

A Hybrid Computer-aided-diagnosis System for Prediction of Breast Cancer Recurrence (HPBCR) Using Optimized Ensemble Learning

Mohammad R Mohebian et al. Comput Struct Biotechnol J. .

Abstract

Cancer is a collection of diseases that involves growing abnormal cells with the potential to invade or spread to the body. Breast cancer is the second leading cause of cancer death among women. A method for 5-year breast cancer recurrence prediction is presented in this manuscript. Clinicopathologic characteristics of 579 breast cancer patients (recurrence prevalence of 19.3%) were analyzed and discriminative features were selected using statistical feature selection methods. They were further refined by Particle Swarm Optimization (PSO) as the inputs of the classification system with ensemble learning (Bagged Decision Tree: BDT). The proper combination of selected categorical features and also the weight (importance) of the selected interval-measurement-scale features were identified by the PSO algorithm. The performance of HPBCR (hybrid predictor of breast cancer recurrence) was assessed using the holdout and 4-fold cross-validation. Three other classifiers namely as supported vector machines, DT, and multilayer perceptron neural network were used for comparison. The selected features were diagnosis age, tumor size, lymph node involvement ratio, number of involved axillary lymph nodes, progesterone receptor expression, having hormone therapy and type of surgery. The minimum sensitivity, specificity, precision and accuracy of HPBCR were 77%, 93%, 95% and 85%, respectively in the entire cross-validation folds and the hold-out test fold. HPBCR outperformed the other tested classifiers. It showed excellent agreement with the gold standard (i.e. the oncologist opinion after blood tumor marker and imaging tests, and tissue biopsy). This algorithm is thus a promising online tool for the prediction of breast cancer recurrence.

Keywords: Breast cancer; CAD, computer-aided diagnosis; Cancer recurrence; Computer-assisted diagnosis; DT, decision tree; FH, family history of cancer; HPBCR, the proposed hybrid predictor of breast cancer recurrence; HRT, hormone therapy; I. Node, number of involved axillary lymph nodes; Machine learning; NR, lymph node involvement ratio; Prognosis; T. Node, number of dissected axillary lymph nodes; TS, tumor size; XRT, radiotherapy.

PubMed Disclaimer

Figures

Supplementary material S5
Supplementary material S5
The snapshot of the developed on-line HPBCR. The information of a subject with breast cancer was entered. Based on the prediction, it is likely to have recurrence within 5 years after diagnosis.
Fig. 1
Fig. 1
The structure of the proposed prognosis system (HPBCR). Other classifiers such as SVM and MLP could be used instead of BDT. The pseudo-code of HPBCR is provided in the Supplementary material S3. The input features in HPBCR were: diagnosis age, nodal ratio, menarche age, the number of pregnancy, tumor size, Ki67, the number of involved and dissected nodes, as interval features, type of surgery, molecular subtypes, family history of cancer, multifocal tumor, estrogen and progesterone receptor status, p53 mutation, Her2 expression, Cathepsin-D protein status, using hormone therapy, and radiotherapy as nominal features and the tumor grade as an ordinal feature. Briefly, the input features are first selected using statistical feature selection. The selected features were used by Bagged Decision Tree to build the classifier. The optimal feature set and the weight of features are estimated using Particle Swarm Optimization during learning. The algorithm stops if no significant improvement is seen in the objective function or the maximum number of iterations (set to 100 in our study) is reached. The structure of the proposed prognosis system (HPBCR). Other classifiers such as SVM and MLP could be used instead of BDT. The pseudo-code of HPBCR is provided in the Supplementary material S3. The input features in HPBCR were: diagnosis age, nodal ratio, menarche age, the number of pregnancy, tumor size, Ki67, the number of involved and dissected nodes, as interval features, type of surgery, molecular subtypes, family history of cancer, multifocal tumor, estrogen and progesterone receptor status, p53 mutation, Her2 expression, Cathepsin-D protein status, using hormone therapy, and radiotherapy as nominal features and the tumor grade as an ordinal feature. Briefly, the input features are first selected using statistical feature selection. The selected features were used by Bagged Decision Tree to build the classifier. The optimal feature set and the weight of features are estimated using Particle Swarm Optimization during learning. The algorithm stops if no significant improvement is seen in the objective function or the maximum number of iterations (set to 100 in our study) is reached.
Fig. 2
Fig. 2
The value of the fitness function (F-Score of the proposed classifier (HPBCR) on the training set-solid line) and the F-Score on the test set (dash-dot line) during optimization procedure. The termination criterion was only the maximum number of iterations (i.e. 100) in this plot.

References

    1. Doi K. Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imaging Graph. 2007;31:198–211. - PMC - PubMed
    1. Castellino R.A. Computer aided detection (CAD): an overview. Cancer Imaging. 2005;5:17–19. - PMC - PubMed
    1. Rangayyan R.M., Ayres F.J., Desautels J.L. A review of computer-aided diagnosis of breast cancer: toward the detection of subtle signs. J Franklin Inst. 2007;344:312–348.
    1. Kim J.H. Computer-aided diagnosis for lung cancer. J Lung Cancer. 2004;3:67–70.
    1. Oweis R., Hijazi L. A computer-aided ECG diagnostic tool. Comput Methods Programs Biomed. 2006;81:279–284. - PubMed