Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 29;18(1):33.
doi: 10.1186/s13071-024-06618-6.

Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing

Affiliations

Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing

Enas Al-Khlifeh et al. Parasit Vectors. .

Abstract

Background: Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples. However, this approach can sometimes result in the misinterpretation of amebiasis as other gastroenteritis (GE) conditions. The goal of the work is to produce a machine learning (ML) model that uses laboratory findings and demographic information to automatically predict amebiasis.

Method: Data extracted from Jordanian electronic medical records (EMR) between 2020 and 2022 comprised 763 amebic cases and 314 nonamebic cases. Patient demographics, clinical signs, microscopic diagnoses, and leukocyte counts were used to train eight decision tree algorithms and compare their accuracy of predictions. Feature ranking and correlation methods were implemented to enhance the accuracy of classifying amebiasis from other conditions.

Results: The primary dependent variables distinguishing amebiasis include the percentage of neutrophils, mucus presence, and the counts of red blood cells (RBCs) and white blood cells (WBCs) in stool samples. Prediction accuracy and precision ranged from 92% to 94.6% when employing decision tree classifiers including decision tree (DT), random forest (RF), XGBoost, AdaBoost, and gradient boosting (GB). However, the optimized RF model demonstrated an area under the curve (AUC) of 98% for detecting amebiasis from laboratory data, utilizing only 300 estimators with a max depth of 20. This study highlights that amebiasis is a significant health concern in Jordan, responsible for 17.22% of all gastroenteritis episodes in this study. Male sex and age were associated with higher incidence of amebiasis (P = 0.014), with over 25% of cases occurring in infants and toddlers.

Conclusions: The application of ML to EMR can accurately predict amebiasis. This finding significantly contributes to the emerging use of ML as a decision support system in parasitic disease diagnosis.

Keywords: E. histolytica; Amebiasis; Decision tree; Electronic medical records (EMR); Feature selection; Jordan; Leukocytosis; Machine learning; Microscopic diagnosis; Stool RBCs.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: This study has been approved by the scientific and administration committee of scientific research at Al-Balqa Applied University and Jordan’s Al-Hussein/Salt Hospital. The material is the authors’ original work that has not been previously published elsewhere. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
a The top features account for 95% of the data’s importance in predicting amebiasis. b A correlation heat map where strong positive correlations are denoted by red, strong negative correlations by blue, and a lack of color indicates weak correlation
Fig. 2
Fig. 2
A point-biserial correlation heatmap, where strong positive correlations are denoted by red, strong negative correlations by blue, and a lack of color indicates weak correlation
Fig. 3
Fig. 3
Comparison of clinical manifestations, age, and sex of the Jordanian patients with confirmed cases of amebiasis. a Age distribution of the cases and b age distribution according to symptoms being diagnosed. Significant difference when P-value < 0.05. n = 763
Fig. 4
Fig. 4
Distribution of laboratory findings, clinical manifestations, age, and sex of Jordanian patients with confirmed cases of amebiasis gastroenteritis. Severity features indicated by WBC count (n = 763)
Fig. 5
Fig. 5
Distribution of laboratory findings, clinical manifestations, age, and sex of Jordanian patients with confirmed cases of amebiasis. Severity features indicated by neutrophil percentage (n = 763)
Fig. 6
Fig. 6
Grid search process to find the best parameters for RF (top). AUC = 0.98 results when using RF with its best parameters (bottom left), and the confusion matrix results using RF with its best parameters (bottom right)
Fig. 7
Fig. 7
Three-dimensional visualization of three features of our data

Similar articles

References

    1. Troeger C, Forouzanfar M, Rao PC, Khalil I, Brown A, Reiner RC, et al. Estimates of global, regional, and national morbidity, mortality, and aetiologies of diarrhoeal diseases: a systematic analysis for the Global Burden of Disease Study 2015. Lancet Infect Dis. 2017;17:909–48. - PMC - PubMed
    1. Shirley DAT, Farr L, Watanabe K, Moonah S. A review of the global burden, new diagnostics, and current therapeutics for amebiasis. In: Open forum infectious diseases. vol. 5. Oxford University Press US; 2018. p. ofy161. - PMC - PubMed
    1. dos Santos Zanetti A, Malheiros AF, de Matos TA, Dos Santos C, Battaglini PF, Moreira LM, et al. Diversity, geographical distribution, and prevalence of Entamoeba spp. in Brazil: a systematic review and meta-analysis. Parasite. 2021;28:17. - PMC - PubMed
    1. Singh A, Banerjee T, Khan U, Shukla SK. Epidemiology of clinically relevant Entamoeba spp. (E. histolytica/dispar/moshkovskii/bangladeshi): a cross sectional study from North India. PLoS Neglect Trop Dis. 2021;15:e0009762. - PMC - PubMed
    1. Alvarado-Esquivel C, Hernandez-Tinoco J, Sanchez-Anguiano LF. Seroepidemiology of Entamoeba histolytica infection in general population in rural Durango, Mexico. J Clin Med Res. 2015;7:435. - PMC - PubMed

MeSH terms

LinkOut - more resources