A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction
- PMID: 36304293
- PMCID: PMC9580915
- DOI: 10.3389/fbinf.2022.927312
A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction
Abstract
Machine learning has shown utility in detecting patterns within large, unstructured, and complex datasets. One of the promising applications of machine learning is in precision medicine, where disease risk is predicted using patient genetic data. However, creating an accurate prediction model based on genotype data remains challenging due to the so-called "curse of dimensionality" (i.e., extensively larger number of features compared to the number of samples). Therefore, the generalizability of machine learning models benefits from feature selection, which aims to extract only the most "informative" features and remove noisy "non-informative," irrelevant and redundant features. In this article, we provide a general overview of the different feature selection methods, their advantages, disadvantages, and use cases, focusing on the detection of relevant features (i.e., SNPs) for disease risk prediction.
Keywords: disease risk prediction; feature selection (FS); machine learing; risk prediction; statistical approaches.
Copyright © 2022 Pudjihartono, Fadason, Kempa-Liehr and O'Sullivan.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures






Similar articles
-
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8. Med Phys. 2019. PMID: 30891794 Free PMC article.
-
The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus.Genes (Basel). 2022 Jun 23;13(7):1129. doi: 10.3390/genes13071129. Genes (Basel). 2022. PMID: 35885912 Free PMC article.
-
Feature Selection Stability and Accuracy of Prediction Models for Genomic Prediction of Residual Feed Intake in Pigs Using Machine Learning.Front Genet. 2021 Feb 22;12:611506. doi: 10.3389/fgene.2021.611506. eCollection 2021. Front Genet. 2021. PMID: 33692825 Free PMC article.
-
Machine Learning SNP Based Prediction for Precision Medicine.Front Genet. 2019 Mar 27;10:267. doi: 10.3389/fgene.2019.00267. eCollection 2019. Front Genet. 2019. PMID: 30972108 Free PMC article. Review.
-
A review of feature reduction techniques in neuroimaging.Neuroinformatics. 2014 Apr;12(2):229-44. doi: 10.1007/s12021-013-9204-3. Neuroinformatics. 2014. PMID: 24013948 Free PMC article. Review.
Cited by
-
Identification of Shared Signature Genes and Immune Microenvironment Subtypes for Heart Failure and Chronic Kidney Disease Based on Machine Learning.J Inflamm Res. 2024 Mar 21;17:1873-1895. doi: 10.2147/JIR.S450736. eCollection 2024. J Inflamm Res. 2024. PMID: 38533476 Free PMC article.
-
Physicochemical modelling of the retention mechanism of temperature-responsive polymeric columns for HPLC through machine learning algorithms.J Cheminform. 2024 Jun 21;16(1):72. doi: 10.1186/s13321-024-00873-6. J Cheminform. 2024. PMID: 38907264 Free PMC article.
-
Driving drowsiness detection using spectral signatures of EEG-based neurophysiology.Front Physiol. 2023 Mar 30;14:1153268. doi: 10.3389/fphys.2023.1153268. eCollection 2023. Front Physiol. 2023. PMID: 37064914 Free PMC article.
-
Identification of Parkinson's disease using MRI and genetic data from the PPMI cohort: an improved machine learning fusion approach.Front Aging Neurosci. 2025 Feb 4;17:1510192. doi: 10.3389/fnagi.2025.1510192. eCollection 2025. Front Aging Neurosci. 2025. PMID: 39968123 Free PMC article.
-
pDILI_v1: A Web-Based Machine Learning Tool for Predicting Drug-Induced Liver Injury (DILI) Integrating Chemical Space Analysis and Molecular Fingerprints.ACS Omega. 2025 Mar 25;10(13):13502-13514. doi: 10.1021/acsomega.5c00075. eCollection 2025 Apr 8. ACS Omega. 2025. PMID: 40224405 Free PMC article.
References
-
- Abramovich F., Benjamini Y., Donoho D. L., Johnstone I. M. (2006). Adapting to Unknown Sparsity by Controlling the False Discovery Rate. Ann. Stat. 34, 584–653. 10.1214/009053606000000074 - DOI
-
- Álvarez-Estévez D., Sánchez-Maroño N., Alonso-Betanzos A., Moret-Bonillo V. (2011). Reducing Dimensionality in a Database of Sleep EEG Arousals. Expert Syst. Appl. 38, 7746–7754.
-
- Alzubi R., Ramzan N., Alzoubi H., Amira A. (2017). A Hybrid Feature Selection Method for Complex Diseases SNPs. IEEE Access 6, 1292–1301. 10.1109/ACCESS.2017.2778268 - DOI
Publication types
LinkOut - more resources
Full Text Sources