Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 15;15(1):16917.
doi: 10.1038/s41598-025-01458-5.

Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms

Affiliations

Machine learning approach for differentiating iron deficiency anemia and thalassemia using random forest and gradient boosting algorithms

Wanicha Tepakhan et al. Sci Rep. .

Abstract

Formulas based on red blood cell indices have been used to differentiate between iron deficiency anemia (IDA) and thalassemia (Thal). However, they exhibit varying efficiencies. In this study, we aimed to develop a tool for discriminating between IDA and Thal by using the random forest (RF) and gradient boosting (GB) algorithms. Complete blood count data from 1143 patients with anemia and low mean corpuscular volume were collected (382 patients with IDA, 635 with Thal, and 126 with IDA and Thal). The data were randomly divided into the training and testing datasets in a ratio of 80:20. The RF and GB models had good diagnostic performances for predicting IDA and Thal in the training and testing datasets. In the testing dataset for predicting binary outcomes, GB and RF both had an accuracy of 90.7%, and an area under the receiver operating characteristic curve (AUC-ROC) of 0.953. A lower diagnostic performance was observed when patients with IDA and Thal were included. GB and RF showed accuracies of 80.4% and 82.2%, respectively, and AUC-ROC values of 0.910 and 0.899, respectively. In conclusion, we developed a machine learning approach using GB algorithm. This tool is potentially useful in Thal- and IDA-endemic regions.

Keywords: Gradient boosting; Iron deficiency anemia; Machine learning; Random forest; Thalassemia.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interest: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Study population flow and distribution of cases in the training and testing dataset. CBC, complete blood count; MCV, mean corpuscular volume
Fig. 2
Fig. 2
Variable importance from multiclass outcomes (Thal, IDA, IDA with Thal) of the random forest model (a) and gradient boosting model (b).

Similar articles

References

    1. Newhall, D. A., Oliver, R. & Lugthart, S. Anaemia: A disease or symptom. Neth. J. Med.78, 104–110 (2020). - PubMed
    1. GBD2021 Anaemia Collaborators. Prevalence, years lived with disability, and trends in anaemia burden by severity and cause, 1990–2021: Findings from the Global Burden of Disease Study 2021. Lancet Haematol.10, e713–e734 (2023). - PMC - PubMed
    1. Winichagoon, P. Prevention and control of anemia: Thailand experiences. J. Nutr.132(Supplement), 862S-866S (2002). - PubMed
    1. Sirachainan, N. et al. New mathematical formula for differentiating thalassemia trait and iron deficiency anemia in thalassemia prevalent area: A study in healthy school-age children. Southeast Asian. J. Trop. Med. Public Health45, 174–182 (2014). - PubMed
    1. Fucharoen, S. & Winichagoon, P. Haemoglobinopathies in Southeast Asia. Indian J. Med. Res.134, 498–506 (2011). - PMC - PubMed

LinkOut - more resources