Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach
- PMID: 35934714
- PMCID: PMC9358850
- DOI: 10.1186/s12859-022-04870-0
Risk score prediction model based on single nucleotide polymorphism for predicting malaria: a machine learning approach
Abstract
Background: The malaria risk prediction is currently limited to using advanced statistical methods, such as time series and cluster analysis on epidemiological data. Nevertheless, machine learning models have been explored to study the complexity of malaria through blood smear images and environmental data. However, to the best of our knowledge, no study analyses the contribution of Single Nucleotide Polymorphisms (SNPs) to malaria using a machine learning model. More specifically, this study aims to quantify an individual's susceptibility to the development of malaria by using risk scores obtained from the cumulative effects of SNPs, known as weighted genetic risk scores (wGRS).
Results: We proposed an SNP-based feature extraction algorithm that incorporates the susceptibility information of an individual to malaria to generate the feature set. However, it can become computationally expensive for a machine learning model to learn from many SNPs. Therefore, we reduced the feature set by employing the Logistic Regression and Recursive Feature Elimination (LR-RFE) method to select SNPs that improve the efficacy of our model. Next, we calculated the wGRS of the selected feature set, which is used as the model's target variables. Moreover, to compare the performance of the wGRS-only model, we calculated and evaluated the combination of wGRS with genotype frequency (wGRS + GF). Finally, Light Gradient Boosting Machine (LightGBM), eXtreme Gradient Boosting (XGBoost), and Ridge regression algorithms are utilized to establish the machine learning models for malaria risk prediction.
Conclusions: Our proposed approach identified SNP rs334 as the most contributing feature with an importance score of 6.224 compared to the baseline, with an importance score of 1.1314. This is an important result as prior studies have proven that rs334 is a major genetic risk factor for malaria. The analysis and comparison of the three machine learning models demonstrated that LightGBM achieves the highest model performance with a Mean Absolute Error (MAE) score of 0.0373. Furthermore, based on wGRS + GF, all models performed significantly better than wGRS alone, in which LightGBM obtained the best performance (0.0033 MAE score).
Keywords: Feature extraction algorithm; Genetic risk factors; Machine learning; Malaria; Single nucleotide polymorphisms; Weighted genetic risk score.
© 2022. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures









Similar articles
-
Prediction and feature selection of low birth weight using machine learning algorithms.J Health Popul Nutr. 2024 Oct 12;43(1):157. doi: 10.1186/s41043-024-00647-8. J Health Popul Nutr. 2024. PMID: 39396025 Free PMC article.
-
Development and validation of a machine learning-based predictive model for assessing the 90-day prognostic outcome of patients with spontaneous intracerebral hemorrhage.J Transl Med. 2024 Mar 4;22(1):236. doi: 10.1186/s12967-024-04896-3. J Transl Med. 2024. PMID: 38439097 Free PMC article.
-
A machine-learning approach for nonalcoholic steatohepatitis susceptibility estimation.Indian J Gastroenterol. 2022 Oct;41(5):475-482. doi: 10.1007/s12664-022-01263-2. Epub 2022 Nov 11. Indian J Gastroenterol. 2022. PMID: 36367682
-
Integrating genetics, metabolites, and clinical characteristics in predicting cardiometabolic health outcomes using machine learning algorithms - A systematic review.Comput Biol Med. 2025 Mar;186:109661. doi: 10.1016/j.compbiomed.2025.109661. Epub 2025 Jan 11. Comput Biol Med. 2025. PMID: 39799831
-
Image analysis and machine learning for detecting malaria.Transl Res. 2018 Apr;194:36-55. doi: 10.1016/j.trsl.2017.12.004. Epub 2018 Jan 12. Transl Res. 2018. PMID: 29360430 Free PMC article. Review.
References
-
- World Health Organization. World malaria report 2020: 20 years of global progress and challenges. World Health Organization; 2020. Available from: https://www.who.int/docs/default-source/malaria/world-malaria-reports/97....
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical