. 2023 Sep 14;14(9):1802.

doi: 10.3390/genes14091802.

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

Affiliations

¹ Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka 1207, Bangladesh.
² Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh.
³ School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3125, Australia.
⁴ Centre for Smart Analytics, Federation University Australia, Ballarat, VIC 3842, Australia.
⁵ Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11564, Saudi Arabia.
⁶ Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh.
⁷ Department of Software Engineering, Daffodil International University (DIU), Dhaka 1342, Bangladesh.
⁸ Department of Basic Medical Sciences, College of Applied Medical Sciences in Khamis Mushyt King Khalid University, Abha 61412, Saudi Arabia.
⁹ Artificial Intelligence & Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD 4072, Australia.

PMID: 37761941
PMCID: PMC10530870
DOI: 10.3390/genes14091802

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

Rabea Khatun et al. Genes (Basel). 2023.

. 2023 Sep 14;14(9):1802.

doi: 10.3390/genes14091802.

Authors

Affiliations

¹ Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka 1207, Bangladesh.
² Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh.
³ School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3125, Australia.
⁴ Centre for Smart Analytics, Federation University Australia, Ballarat, VIC 3842, Australia.
⁵ Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11564, Saudi Arabia.
⁶ Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh.
⁷ Department of Software Engineering, Daffodil International University (DIU), Dhaka 1342, Bangladesh.
⁸ Department of Basic Medical Sciences, College of Applied Medical Sciences in Khamis Mushyt King Khalid University, Abha 61412, Saudi Arabia.
⁹ Artificial Intelligence & Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD 4072, Australia.

PMID: 37761941
PMCID: PMC10530870
DOI: 10.3390/genes14091802

Abstract

Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods.

Keywords: cancer detection; feature selection; gene analysis; gene data; machine learning; voting classifier.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
The methodology process is illustrated in a workflow diagram. (1) Preprocessing was performed on three datasets, namely leukaemia, colon, and 11-tumor datasets. (2) Using different FSMs, such as PCA, recursive feature elimination, Pearson correlation, ridge regression, variance threshold, and also proposed rank-based ensemble feature selection, significant features were extracted. (3) Dataset was split into 70:30 train and test datasets. (4) Reduced dataset was trained using ML classifiers, including KNN, DT, SVM, and the proposed voting ensemble classifier. (5) Further voting classifier was compared with built-in ensemble classifiers such as AdaBoost, gradient boost and random forest classifier. (6) Using different performance matrices, such as accuracy and confusion matrix, the performance of the model was assessed and analyzed.

**Figure 3**
Comparison of FSMs and classifiers using accuracy.

**Figure 4**
Comparison of FSMs and classifiers using accuracy.

**Figure 5**
Comparison of voting and built-in ensemble classifiers using accuracy, precision, recall, and f1-score in the leukemia dataset.

**Figure 6**
Comparison of voting and built-in ensemble classifiers using accuracy, precision, recall, and f1-score in the colon dataset.

**Figure 7**
Comparison of voting and built-in ensemble classifiers using accuracy, precision, recall, and f1-score in the 11-tumor dataset.

**Figure 8**
Confusion matrix with best results for different datasets.

**Figure 9**
AUROC curve with best results for different datasets.

See this image and copyright information in PMC

References

1. Talukder M.A., Islam M.M., Uddin M.A., Akhter A., Pramanik M.A.J., Aryal S., Almoyad M.A.A., Hasan K.F., Moni M.A. An efficient deep learning model to categorize brain tumor using reconstruction and fine-tuning. Expert Syst. Appl. 2023:120534.
1. Talukder M.A., Islam M.M., Uddin M.A., Akhter A., Hasan K.F., Moni M.A. Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning. Expert Syst. Appl. 2022;205:117695.
1. Sharmin S., Ahammad T., Talukder M.A., Ghose P. A Hybrid Dependable Deep Feature Extraction and Ensemble-based Machine Learning Approach for Breast Cancer Detection. IEEE Access. 2023;11:87694–87708. doi: 10.1109/ACCESS.2023.3304628. - DOI
1. World Health Organization Media Centre . Cancer Fact Sheet. World Health Organization; Geneva, Switzerland: 2020.
1. Horng J.T., Wu L.C., Liu B.J., Kuo J.L., Kuo W.H., Zhang J.J. An expert system to classify microarray gene expression data using gene selection by decision tree. Expert Syst. Appl. 2009;36:9072–9081.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

Affiliations

Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical