Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2025 Mar 30;15(1):10944.
doi: 10.1038/s41598-025-95786-1.

SGA-Driven feature selection and random forest classification for enhanced breast cancer diagnosis: A comparative study

Affiliations
Comparative Study

SGA-Driven feature selection and random forest classification for enhanced breast cancer diagnosis: A comparative study

Abrar Yaqoob et al. Sci Rep. .

Abstract

In this study, we propose a novel approach for breast cancer classification that integrates the Seagull Optimization Algorithm (SGA) for feature selection with the Random Forest (RF) classifier for effective data classification. The novelty of our approach lies in the first-time application of SGA for gene selection in breast cancer diagnosis, where SGA systematically explores the feature space to identify the most informative gene subsets, thereby improving classification accuracy and reducing computational complexity. The selected features are subsequently classified using RF, known for its robustness and high accuracy in handling complex datasets. To evaluate the effectiveness of the proposed method, we compared it with other classifiers, including Linear Regression (LR), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). The proposed SGA-RF combination achieved a best mean accuracy of 99.01% with 22 genes, outperforming other methods and demonstrating consistent performance across varying feature subsets. The mean accuracies ranged from 85.35 to 94.33%, highlighting a balance between feature reduction and classification accuracy. Future work will explore the integration of other nature-inspired algorithms and deep learning models to further enhance performance and clinical applicability.

Keywords: Cancer classification; High dimensional data; Random forest; Seagull optimization algorithm.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Ethical approval: This article does not contain any studies with human participants or animals performed by any of the authors. Consent to participate: Not applicable. This study does not involve human participants requiring consent. Consent to publish: Not applicable. The manuscript does not include any individual person’s data in any form requiring consent.

Figures

Fig. 1
Fig. 1
Different techniques to visualize breast cancer Images.
Fig. 2
Fig. 2
Publication from the year 2015–2024 based on breast cancer with deep learning.
Fig. 3
Fig. 3
SGA Process.
Algorithm 1:
Algorithm 1:
Seagull Optimization with Random Forest Classifier for Breast Cancer Classification
Fig. 4
Fig. 4
Comprehensive analysis of the performance of the proposed method.
Fig. 5
Fig. 5
Confusion Matrix analysis of the performance of the proposed method.
Fig. 6
Fig. 6
Precision Recall Curve.
Fig. 7
Fig. 7
Receiver Operating Characteristic curve.
Fig. 8
Fig. 8
Histogramic Representation of Table 2.
Fig. 9
Fig. 9
Precision Recall Curve comparison for all the classifiers.
Fig. 10
Fig. 10
Roc Curve comparison for all the classifiers used.

Similar articles

Cited by

References

    1. Yaqoob, A., Mir, M. A., Rao, G. V. V. J. & Tejani, G. G. ‘Transforming Cancer Classification: The Role of Advanced Gene Selection’, pp. 1–19, (2024). - PMC - PubMed
    1. Yaqoob, A., Verma, N. K., Aziz, R. M. & Shah, M. A. RNA-Seq analysis for breast cancer detection: a study on paired tissue samples using hybrid optimization and deep learning techniques. J. Cancer Res. Clin. Oncol.150 (10), 455. 10.1007/s00432-024-05968-z (2024). - PMC - PubMed
    1. Yaqoob, A., Verma, N. K., Aziz, R. M. & Shah, M. A. Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights. Cancer Immunol. Immunother. 73 (12), 261. 10.1007/s00262-024-03843-x (2024). - PMC - PubMed
    1. Yaqoob, A., Verma, N. K. & Aziz, R. M. Improving breast cancer classification with mRMR + SS0 + WSVM: a hybrid approach. Multimed Tools Appl.10.1007/s11042-024-20146-6 (2024).
    1. Bhat, A. S. et al. Cancer initiation and progression: A comprehensive review of carcinogenic substances, Anti-Cancer therapies, and regulatory frameworks. Asian J. Res. Biochem.14 (4), 111–125. 10.9734/ajrb/2024/v14i4300 (2024).

Publication types

LinkOut - more resources