Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 2;10(17):e37165.
doi: 10.1016/j.heliyon.2024.e37165. eCollection 2024 Sep 15.

A comprehensive learning based swarm optimization approach for feature selection in gene expression data

Affiliations

A comprehensive learning based swarm optimization approach for feature selection in gene expression data

Subha Easwaran et al. Heliyon. .

Abstract

Gene expression data analysis is challenging due to the high dimensionality and complexity of the data. Feature selection, which identifies relevant genes, is a common preprocessing step. We propose a Comprehensive Learning-Based Swarm Optimization (CLBSO) approach for feature selection in gene expression data. CLBSO leverages the strengths of ants and grasshoppers to efficiently explore the high-dimensional search space. Ants perform local search and leave pheromone trails to guide the swarm, while grasshoppers use their ability to jump long distances to explore new regions and avoid local optima. The proposed approach was evaluated on several publicly available gene expression datasets and compared with state-of-the-art feature selection methods. CLBSO achieved an average accuracy improvement of 15% over the original high-dimensional data and outperformed other feature selection methods by up to 10%. For instance, in the Pancreatic cancer dataset, CLBSO achieved 97.2% accuracy, significantly higher than XGBoost-MOGA's 84.0%. Convergence analysis showed CLBSO required fewer iterations to reach optimal solutions. Statistical analysis confirmed significant performance improvements, and stability analysis demonstrated consistent gene subset selection across different runs. These findings highlight the robustness and efficacy of CLBSO in handling complex gene expression datasets, making it a valuable tool for enhancing classification tasks in bioinformatics.

Keywords: Cancer classification; Comprehensive learning; Feature selection; Gene expression; Gene selection; Swarm intelligence.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1
Figure 1
Architecture of the proposed Comprehensive Learning-Based Swarm Optimization (CLBSO) model. The diagram details the Initialization Phase, Local Search Phase, Global Search Phase, and Comprehensive Learning Phase, illustrating the flow and interactions within the algorithm.
Figure 2
Figure 2
Working of the proposed CLBSO Feature Selection Approach.
Algorithm 1
Algorithm 1
Comprehensive Learning-Based Swarm Optimization (CLBSO).
Figure 3
Figure 3
Accuracy comparison for different classifiers on multiple cancer datasets, with and without Feature Selection using CLBSO. Each panel represents a specific classifier: (a) SVM, (b) MLP, (c) DT, (d) NB, (e) RF, and (f) KNN. The charts illustrate the improvements in accuracy achieved through the application of CLBSO across various cancer datasets, showing the effectiveness of feature selection in enhancing classification performance.
Figure 4
Figure 4
Stability of CLBSO using Jaccard Index across various cancer datasets. Each line represents the Jaccard Index values for different runs and the average value for each dataset.
Figure 5
Figure 5
Convergence analysis of different algorithms across cancer datasets. Each panel represents the mean number of iterations required for convergence by a specific algorithm: (a) CLBSO (proposed), (b) XGBoost-MOGA, (c) ISSA, (d) BCOOT, and (e) SBCSO. The comparison across various cancer datasets illustrates the efficiency of each algorithm in achieving convergence, highlighting the performance differences in terms of computational iterations.
Figure 6
Figure 6
Heatmap of the Ablation Study comparing the performance of the original CLBSO and CLBSO without the comprehensive learning phase. The heatmap illustrates the accuracy (Acc) and F-measure (Fm) for each cancer dataset, highlighting the impact of the comprehensive learning phase on the algorithm's performance.

References

    1. Sharafi Y., Teshnehlab M., Aria M.M. A self-adaptive binary cat swarm optimization using new time-varying transfer function for gene selection in dna microarray expression cancer data. Soft Comput. 2023;4 doi: 10.1007/s00500-023-07988-2. - DOI
    1. Pashaei E., Pashaei E. Hybrid binary coot algorithm with simulated annealing for feature selection in high-dimensional microarray data. Neural Comput. Appl. 2023;35:353–374. doi: 10.1007/s00521-022-07780-7. - DOI
    1. Ibrahim R.A., Ewees A.A., Oliva D., Elaziz M.A., Lu S. Improved salp swarm algorithm based on particle swarm optimization for feature selection. J. Ambient Intell. Humaniz. Comput. 2019;10:3155–3169. doi: 10.1007/s12652-018-1031-9. - DOI
    1. Deng X., Li M., Deng S., Wang L. Hybrid gene selection approach using xgboost and multi-objective genetic algorithm for cancer classification. Med. Biol. Eng. Comput. 2022;60:663–681. doi: 10.1007/s11517-021-02476-x. - DOI - PubMed
    1. Maayah B., Arqub O.A. Uncertain m-fractional differential problems: existence, uniqueness, and approximations using Hilbert reproducing technique provisioner with the case application: series resistor-inductor circuit. Phys. Scr. 2024;99(2) doi: 10.1088/1402-4896/ad1738. doi: 10.1088/1402-4896/ad1738. - DOI - DOI

LinkOut - more resources