Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 14:13:e19025.
doi: 10.7717/peerj.19025. eCollection 2025.

A retrospective study using machine learning to develop predictive model to identify rotavirus-associated acute gastroenteritis in children

Affiliations

A retrospective study using machine learning to develop predictive model to identify rotavirus-associated acute gastroenteritis in children

Sourav Paul et al. PeerJ. .

Abstract

Background: Rotavirus is the leading cause of severe dehydrating diarrhea in children under 5 years worldwide. Timely diagnosis is critical, but access to confirmatory testing is limited in hospital settings. Machine learning (ML) models have shown promising potential in supporting symptom-based diagnosis of several diseases in resource-limited settings.

Objectives: This study aims to develop a machine-learning predictive model integrated with multiple sources of clinical parameters specific to rotavirus infection without relying on laboratory tests.

Methods: A clinical dataset of 509 children was collected in collaboration with the Regional Institute of Medical Sciences, Imphal, India. The clinical symptoms included diarrhea and its duration, number of stool episodes per day, fever, vomiting and its duration, number of vomiting episodes per day, temperature and dehydration. Correlation analysis is performed to check the feature-feature and feature-outcome collinearity. Feature selection using ANOVA F test is carried out to find the feature importance values and finally obtain the reduced feature subset. Seven supervised learning models were tested and compared viz., support vector machine (SVM), K-nearest neighbor (KNN), naive Bayes (NB), logistic regression (Log_R) , random forest (RF), decision tree (DT), and XGBoost (XGB). A comparison of the performances of the seven models using the classification results obtained. The performance of the models was evaluated based on accuracy, precision, recall, specificity, F1 score, macro F1, F2, and receiver operator characteristic curve.

Results: The seven ML models were exhaustively experimented on our dataset and compared based on eight evaluation scores which are accuracy, precision, recall, specificity, F1 score, F2 score, macro F1 score, and AUC values computed. We observed that when the seven ML models were applied, RF performed the best with an accuracy of 81.4%, F1 score of 86.9%, macro F1-score of 77.3%, F2 score of 86.5% and area under the curve (AUC) of 89%.

Conclusions: The machine learning models can contribute to predicting symptom-based diagnosis of rotavirus-associated acute gastroenteritis in children, especially in resource-limited settings. Further validation of the models using a large dataset is needed for predicting pediatric diarrheic populations with optimum sensitivity and specificity.

Keywords: Child health; Disease diagnosis; Gastroenteritis; Machine learning; Rotavirus; Supervised learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. A schematic framework of machine learning predictive models.
A dataset of 509 children with diarrheic symptoms in the ratio of 80:20 was used for training and testing the seven supervised machine learning algorithms. The best-performing models were selected and developed as predictive diagnostic models for rotavirus diarrhea in the pediatric population.
Figure 2
Figure 2. Correlation matrix features .
Seven clinical parameters that are presented by diarrheic children (n = 509) were utilized to find the correlation of the features with the outcome of rotavirus disease.
Figure 3
Figure 3. Feature selection using the ANOVA F test.
Top six features except dehydration have been selected for performance measurement and evaluation using seven supervised machine learning algorithms.
Figure 4
Figure 4. Machine learning prediction of rotavirus-associated acute gastroenteritis.
Machine learning prediction of rotavirus-associated acute gastroenteritis. Five supervised ML algorithms (RF, XGB, DT, KNN, & Log_R) showed good performance based on the receiver operator characteristic curve (ROC). The ROC curve takes the false-positive rate as the horizontal axis and the true-positive rate as the vertical axis. The horizontal axis represents the proportion of the actual negative instances in the positive class predicted by the classifier to all negative instances. The vertical axis represents the proportion of the actual positive instances in the positive class predicted by the classifier to all positive instances. The area under the curve (AUC) represents the ability of models to differentiate between positive and negative values during prediction.

Similar articles

References

    1. Abdullahi T, Nitschke G, Sweijd N. Predicting diarrhoea outbreaks with climate change. PLOS ONE. 2022;17(4):e0262008. doi: 10.1371/journal.pone.0262008. - DOI - PMC - PubMed
    1. Alanazi R. Identification and prediction of chronic diseases using machine learning approach. Journal of Healthcare Engineering. 2022;2022:2826127. doi: 10.1155/2022/2826127. - DOI - PMC - PubMed
    1. Aliabadi N, Tate JE, Haynes AK, Parashar UD, Centers for Disease Control and Prevention (CDC) Sustained decrease in laboratory detection of rotavirus after implementation of routine vaccination—United States, 2000–2014. MMWR. Morbidity and Mortality Weekly Report. 2015;64(13):337–342. - PMC - PubMed
    1. Ambrish G, Bharathi G, Anitha G, Chetana S, Dhanraj, Kiran M. Logistic regression technique for prediction of cardiovascular disease. Global Transitions Proceedings. 2022;3(1):127–130. doi: 10.1016/j.gltp.2022.04.008. - DOI
    1. Binka FN, Anto FK, Oduro AR, Awini EA, Nazzar AK, Armah GE, Asmah RH, Hall AJ, Cutts F, Alexander N, Brown D, Green J, Gray J, Iturriza-Gómara M, Navrongo Rotavirus Research Group Incidence and risk factors of paediatric rotavirus diarrhoea in northern Ghana. Tropical Medicine & International Health. 2003;8(9):840–846. doi: 10.1046/j.1365-3156.2003.01097.x. - DOI - PubMed

LinkOut - more resources