Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 14;14(9):1802.
doi: 10.3390/genes14091802.

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

Affiliations

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

Rabea Khatun et al. Genes (Basel). .

Abstract

Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods.

Keywords: cancer detection; feature selection; gene analysis; gene data; machine learning; voting classifier.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The methodology process is illustrated in a workflow diagram. (1) Preprocessing was performed on three datasets, namely leukaemia, colon, and 11-tumor datasets. (2) Using different FSMs, such as PCA, recursive feature elimination, Pearson correlation, ridge regression, variance threshold, and also proposed rank-based ensemble feature selection, significant features were extracted. (3) Dataset was split into 70:30 train and test datasets. (4) Reduced dataset was trained using ML classifiers, including KNN, DT, SVM, and the proposed voting ensemble classifier. (5) Further voting classifier was compared with built-in ensemble classifiers such as AdaBoost, gradient boost and random forest classifier. (6) Using different performance matrices, such as accuracy and confusion matrix, the performance of the model was assessed and analyzed.
Figure 2
Figure 2
Confusion matrix.
Figure 3
Figure 3
Comparison of FSMs and classifiers using accuracy.
Figure 4
Figure 4
Comparison of FSMs and classifiers using accuracy.
Figure 5
Figure 5
Comparison of voting and built-in ensemble classifiers using accuracy, precision, recall, and f1-score in the leukemia dataset.
Figure 6
Figure 6
Comparison of voting and built-in ensemble classifiers using accuracy, precision, recall, and f1-score in the colon dataset.
Figure 7
Figure 7
Comparison of voting and built-in ensemble classifiers using accuracy, precision, recall, and f1-score in the 11-tumor dataset.
Figure 8
Figure 8
Confusion matrix with best results for different datasets.
Figure 9
Figure 9
AUROC curve with best results for different datasets.

References

    1. Talukder M.A., Islam M.M., Uddin M.A., Akhter A., Pramanik M.A.J., Aryal S., Almoyad M.A.A., Hasan K.F., Moni M.A. An efficient deep learning model to categorize brain tumor using reconstruction and fine-tuning. Expert Syst. Appl. 2023:120534.
    1. Talukder M.A., Islam M.M., Uddin M.A., Akhter A., Hasan K.F., Moni M.A. Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning. Expert Syst. Appl. 2022;205:117695.
    1. Sharmin S., Ahammad T., Talukder M.A., Ghose P. A Hybrid Dependable Deep Feature Extraction and Ensemble-based Machine Learning Approach for Breast Cancer Detection. IEEE Access. 2023;11:87694–87708. doi: 10.1109/ACCESS.2023.3304628. - DOI
    1. World Health Organization Media Centre . Cancer Fact Sheet. World Health Organization; Geneva, Switzerland: 2020.
    1. Horng J.T., Wu L.C., Liu B.J., Kuo J.L., Kuo W.H., Zhang J.J. An expert system to classify microarray gene expression data using gene selection by decision tree. Expert Syst. Appl. 2009;36:9072–9081.

Publication types