Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 21;12(1):19999.
doi: 10.1038/s41598-022-22011-8.

Prediction of [Formula: see text]-Thalassemia carriers using complete blood count features

Affiliations

Prediction of [Formula: see text]-Thalassemia carriers using complete blood count features

Furqan Rustam et al. Sci Rep. .

Abstract

[Formula: see text]-Thalassemia is one of the dangerous causes of the high mortality rate in the Mediterranean countries. Substantial resources are required to save a [Formula: see text]-Thalassemia carriers' life and early detection of thalassemia patients can help appropriate treatment to increase the carrier's life expectancy. Being a genetic disease, it can not be prevented however the analysis of several indicators in parents' blood can be used to detect disorders causing Thalassemia. Laboratory tests for Thalassemia are time-consuming and expensive like high-performance liquid chromatography, Complete Blood Count (CBC) with peripheral smear, genetic test, etc. Red blood indices from CBC can be used with machine learning models for the same task. Despite the available approaches for Thalassemia carriers from CBC data, gaps exist between the desired and achieved accuracy. Moreover, the data imbalance problem is studied well which makes the models less generalizable. This study proposes a highly accurate approach for [Formula: see text]-Thalassemia detection using red blood indices from CBC augmented by supervised machine learning. In view of the fact that all the features do not carry predictive information regarding the target variable, this study employs a unified framework of two features selection techniques including Principal Component Analysis (PCA) and Singular Vector Decomposition (SVD). The data imbalance between [Formula: see text]-Thalassemia carrier and non-carriers is handled by Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic (ADASYN). Extensive experiments are performed using many state-of-the-art machine learning models and deep learning models. Experimental results indicate the superiority of the proposed approach over existing approaches with an accuracy score of 0.96.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Dataset variable visualization.
Figure 2
Figure 2
Flow of the proposed methodology.
Figure 3
Figure 3
Visualization of combing features from PCA and SVD.
Figure 4
Figure 4
Definition of True Positive (TP) , True Negative (TN), False Positive (FP), False Negative (FN).
Figure 5
Figure 5
Details of three scenarios considered for experiments.
Figure 6
Figure 6
Count of correctly and incorrectly predicted instances of the test data.
Figure 7
Figure 7
Count of correctly and incorrectly predicted instances by ML classifiers when trained SMOTE oversampled data.
Figure 8
Figure 8
Count of correctly and incorrectly predicted instances by ML classifiers when trained on data oversampled using ADASYN.
Figure 9
Figure 9
Count of correctly and incorrectly predicted instances by ML classifiers using SMOTE integrated with unified framework of PCA and SVD.
Figure 10
Figure 10
Count of correctly and incorrectly predicted instances by ML classifiers using ADASYN integrated with unified framework of PCA and SVD.
Figure 11
Figure 11
Comparative analysis of ML models in Scenario 1, Scenario 2, and Scenario 3.
Figure 12
Figure 12
Count of correctly and incorrectly predicted instances of the test data.

Similar articles

Cited by

References

    1. Ansari, S. H. et al. Molecular epidemiology of β-thalassemia in Pakistan: Far reaching implications (2011). - PMC - PubMed
    1. Arif F, Fayyaz J, Hamid A. Awareness among parents of children with thalassemia major. J. Pak. Med. Assoc. 2008;58:621–624. - PubMed
    1. Asif N, Hassan K. Management of thalassemia in Pakistan. J. Islamabad Med. Dent. Coll. 2016;5:152–153.
    1. Sullivan F. What is health informatics? J. Health Serv. Res. Policy. 2001;6:251–254. doi: 10.1258/1355819011927468. - DOI - PubMed
    1. Wu W-T, et al. Data mining in clinical big data: The frequently used databases, steps, and methodological models. Mil. Med. Res. 2021;8:1–12. - PMC - PubMed

Publication types