Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 12;15(1):5239.
doi: 10.1038/s41598-025-89475-2.

A multi-classification deep neural network for cancer type identification from high-dimension, small-sample and imbalanced gene microarray data

Affiliations

A multi-classification deep neural network for cancer type identification from high-dimension, small-sample and imbalanced gene microarray data

Yifu Zeng et al. Sci Rep. .

Abstract

Gene microarray technology provides an efficient way to diagnose cancer. However, microarray gene expression data face the challenges of high-dimension, small-sample, and multi-class imbalance. The coupling of these challenges leads to inaccurate results when using traditional feature selection and classification algorithms. Due to fast learning speed and good classification performance, deep neural network such as generative adversarial network has been proven one of the best classification algorithms, especially in bioinformatics domain. However, it is limited to binary application and inefficient in processing high-dimensional sparse features. This paper proposes a multi-classification generative adversarial network model combined with features bundling (MGAN-FB) to handle the coupling of high-dimension, small-sample, and multi-class imbalance for gene microarray data classification at both feature and algorithmic levels. At feature level, a deep encoder structure combining feature bundling (FB) mechanism and squeeze and excite (SE) mechanism, is designed for the generator. So, the sparsity, correlation and consequence of high-dimension features are all taken into consideration for adaptive features extraction. It achieves effective dimensionality reduction without transitional information loss. At algorithmic level, a softmax module coupled with multi-classifier are introduced into the discriminator, with a new objective function is distinctively designed for the proposed MGAN-FB model, considering encode loss, reconstruction loss, discrimination loss and multi-classification loss. We extend generative adversaria framework from the binary classification to the multi-classification field. Experiments are performed on eight open-source gene microarray datasets from classification performance, running time and non-parametric tests, which demonstrate that the proposed method has obvious advantages over other 7 compared methods.

Keywords: Cancer diagnosis; Gene microarray data; High dimensional; Low-sample-size; Multi-class imbalance.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Framework of the proposed MGAN-FB model.
Algorithm 1
Algorithm 1
Anomaly detection based on MGAN-FB.
Fig. 2
Fig. 2
Illustration of feature bundling principle.
Fig. 3
Fig. 3
Subnet structure of the encoder and decoder.
Fig. 4
Fig. 4
AUC results cancer gene microarray data.
Fig. 5
Fig. 5
Friedman’s test rankings for various evaluation metrics with decision tree multi-classifier.
Fig. 6
Fig. 6
Nemenyi test rankings for various evaluation metrics with decision tree multi-classifier.
Fig. 7
Fig. 7
Friedman’s test rankings for various evaluation metrics with BP multi-classifier.
Fig. 8
Fig. 8
Nemenyi’s test rankings for various evaluation metrics with BP multi-classifier.

References

    1. Liu, Z., Tang, D. Y., Cai, R. Y. & Chen, F. H. A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data. Pattern Recognit. Neurocomputing. 266, 641–650 (2017).
    1. Kar, S., Sharma, K. D. & Maitra, M. Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Syst. Appl.42, 612–627 (2015).
    1. Hung, L. C., Hu, Y. H., Tsai, C. H. & Huang, M. W. A dynamic time warping approach for handling class imbalanced medical datasets with missing values: a case study of protein localization site prediction. Expert Syst. Appl.192, 116437 (2022).
    1. Alexander, S., Constantin, F. A., Ioannis, T., Douglas, H. & Shawn, L. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics21 (5), 631–643 (2005). - PubMed
    1. Jeremiah, I. et al. Optimizing microarray cancer gene selection using swarm intelligence: recent developments and an exploratory study. Egypt. Inf. J.24, 100416 (2023).

LinkOut - more resources