Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 25;18(4):e0284619.
doi: 10.1371/journal.pone.0284619. eCollection 2023.

Feature selection for high dimensional microarray gene expression data via weighted signal to noise ratio

Affiliations

Feature selection for high dimensional microarray gene expression data via weighted signal to noise ratio

Muhammad Hamraz et al. PLoS One. .

Abstract

Feature selection in high dimensional gene expression datasets not only reduces the dimension of the data, but also the execution time and computational cost of the underlying classifier. The current study introduces a novel feature selection method called weighted signal to noise ratio (WSNR) by exploiting the weights of features based on support vectors and signal to noise ratio, with an objective to identify the most informative genes in high dimensional classification problems. The combination of two state-of-the-art procedures enables the extration of the most informative genes. The corresponding weights of these procedures are then multiplied and arranged in decreasing order. Larger weight of a feature indicates its discriminatory power in classifying the tissue samples to their true classes. The current method is validated on eight gene expression datasets. Moreover, results of the proposed method (WSNR) are also compared with four well known feature selection methods. We found that the (WSNR) outperform the other competing methods on 6 out of 8 datasets. Box-plots and Bar-plots of the results of the proposed method and all the other methods are also constructed. The proposed method is further assessed on simulated data. Simulation analysis reveal that (WSNR) outperforms all the other methods included in the study.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Flowchart of the proposed method.
Fig 2
Fig 2. Bar-plots of error rates of the proposed and the other classical methods on various subsets for Leukemia dataset.
Fig 3
Fig 3. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Colon dataset.
Fig 4
Fig 4. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Lungcancer dataset.
Fig 5
Fig 5. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Srbct dataset.
Fig 6
Fig 6. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for DLBCL dataset.
Fig 7
Fig 7. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Breast dataset.
Fig 8
Fig 8. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for TumorC dataset.
Fig 9
Fig 9. Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Prostate dataset.
Fig 10
Fig 10. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Leukemia dataset.
Fig 11
Fig 11. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Colon dataset.
Fig 12
Fig 12. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Lungcancer dataset.
Fig 13
Fig 13. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Srbct dataset.
Fig 14
Fig 14. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for DLBCL dataset.
Fig 15
Fig 15. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Breastcancer dataset.
Fig 16
Fig 16. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for TumorC dataset.
Fig 17
Fig 17. Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Prostate dataset.
Fig 18
Fig 18. Bar-plots of errors produced by different feature selection methods on simulated data having outliers, for various subsets of genes.
Fig 19
Fig 19. Bar-plots of errors produced by different feature selection methods on simulated data, having no outliers, for various subsets of genes.

References

    1. Akinola OA, Agushaka JO, Ezugwu AE. Binary dwarf mongoose optimizer for solving high-dimensional feature selection problems. Plos one. 2022;17(10):e0274850. doi: 10.1371/journal.pone.0274850 - DOI - PMC - PubMed
    1. Abdelwahab O, Awad N, Elserafy M, Badr E. A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma. Plos one. 2022;17(9):e0269126. doi: 10.1371/journal.pone.0269126 - DOI - PMC - PubMed
    1. Song J, Li Z, Yao G, Wei S, Li L, Wu H. Framework for feature selection of predicting the diagnosis and prognosis of necrotizing enterocolitis. PloS one. 2022;17(8):e0273383. doi: 10.1371/journal.pone.0273383 - DOI - PMC - PubMed
    1. Tahmouresi A, Rashedi E, Yaghoobi MM, Rezaei M. Gene selection using pyramid gravitational search algorithm. Plos one. 2022;17(3):e0265351. doi: 10.1371/journal.pone.0265351 - DOI - PMC - PubMed
    1. Taguchi Y, Turki T. Projection in genomic analysis: A theoretical basis to rationalize tensor decomposition and principal component analysis as feature selection tools. PloS one. 2022;17(9):e0275472. doi: 10.1371/journal.pone.0275472 - DOI - PMC - PubMed