Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 21:10:e13124.
doi: 10.7717/peerj.13124. eCollection 2022.

A machine learning approach for identification of gastrointestinal predictors for the risk of COVID-19 related hospitalization

Affiliations

A machine learning approach for identification of gastrointestinal predictors for the risk of COVID-19 related hospitalization

Peter Lipták et al. PeerJ. .

Abstract

Background and aim: COVID-19 can be presented with various gastrointestinal symptoms. Shortly after the pandemic outbreak, several machine learning algorithms were implemented to assess new diagnostic and therapeutic methods for this disease. The aim of this study is to assess gastrointestinal and liver-related predictive factors for SARS-CoV-2 associated risk of hospitalization.

Methods: Data collection was based on a questionnaire from the COVID-19 outpatient test center and from the emergency department at the University Hospital in combination with the data from internal hospital information system and from a mobile application used for telemedicine follow-up of patients. For statistical analysis SARS-CoV-2 negative patients were considered as controls in three different SARS-CoV-2 positive patient groups (divided based on severity of the disease). The data were visualized and analyzed in R version 4.0.5. The Chi-squared or Fisher test was applied to test the null hypothesis of independence between the factors followed, where appropriate, by the multiple comparisons with the Benjamini Hochberg adjustment. The null hypothesis of the equality of the population medians of a continuous variable was tested by the Kruskal Wallis test, followed by the Dunn multiple comparisons test. In order to assess predictive power of the gastrointestinal parameters and other measured variables for predicting an outcome of the patient group the Random Forest machine learning algorithm was trained on the data. The predictive ability was quantified by the ROC curve, constructed from the Out-of-Bag data. Matthews correlation coefficient was used as a one-number summary of the quality of binary classification. The importance of the predictors was measured using the Variable Importance. A 2D representation of the data was obtained by means of Principal Component Analysis for mixed type of data. Findings with the p-value below 0.05 were considered statistically significant.

Results: A total of 710 patients were enrolled in the study. The presence of diarrhea and nausea was significantly higher in the emergency department group than in the COVID-19 outpatient test center. Among liver enzymes only aspartate transaminase (AST) has been significantly elevated in the hospitalized group compared to patients discharged home. Based on the Random Forest algorithm, AST has been identified as the most important predictor followed by age or diabetes mellitus. Diarrhea and bloating have also predictive importance, although much lower than AST.

Conclusion: SARS-CoV-2 positivity is connected with isolated AST elevation and the level is linked with the severity of the disease. Furthermore, using the machine learning Random Forest algorithm, we have identified the elevated AST as the most important predictor for COVID-19 related hospitalizations.

Keywords: Artificial intelligence; COVID-19; Hospitalization; Liver; Machine learning; Predictors; Random forest; SARS-CoV-2; Symptoms.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Aspartate and alanine transaminase in hospitalized patients vs discharged home.
AST, Aspartate transaminase; ALT, Alanine transaminase; p < 0.001. AST activity is significantly higher in the hospitalized group compared to patients discharged home after visit to emergency department. There are no significant differences in ALT activity between this groups of patients.
Figure 2
Figure 2. The ROC curve for general COVID-19 and gastrointestinal symptoms and other measurable data in general clinical settings.
Out-of-bag receiver operating characteristic curve with calculated area under the curve (AUC) = 0.76. The Matthews correlation coefficient was 0.48. For analysis were considered: general COVID-19 symptoms, gastrointestinal symptoms, age, sex, lasting of the symptoms and comorbidities (diabetes mellitus, arterial hypertension and chronic liver diseases).
Figure 3
Figure 3. The ROC curve for selected parameters.
Out-of-bag receiver operating characteristic curve with calculated area under the curve (AUC) = 0.799. The Matthews correlation coefficient was 0.37. For analysis were considered selected parameters (clinically easily measurable): liver enzymes (AST, ALT), gastrointestinal symptoms (diarrhea and bloating), chronic liver disease, age and diabetes mellitus.
Figure 4
Figure 4. Principal component analysis for mixed type of data to obtain two-dimensional representation of the data.
Patients who were discharged home are marked as black dots and those who were admitted to the hospital marked as red dots. The first principal component (x axis) explains 14.04% of the variability; the second principal component (y axis) explains 10.44% of the variability in data. The two groups cannot be completely separated, as there is some overlap of the observations but there is a clear tendency to shift apart of the clusters.
Figure 5
Figure 5. Variable importance plot for all measured factors.
Variable importance plot for all measured factors. The positive value of importance of a predictor represents a positive factor for the predictive accurancy of the Random Forest algorithm. The negative value of importance of a predictor indicates that omitting the predictor increases the predictive accuracy of the Random Forest algorithm.
Figure 6
Figure 6. Variable importance plot for selected factors.
Variable importance plot for selected factors that are fast and easy to measure in the emergency department setting (liver enzymes: AST and ALT, gastrointestinal symptoms /diarrhea and bloating/, age and presence of chronic liver disease and diabetes mellitus). The positive value of importance of a predictor represents a positive factor for the predictive accurancy of the Random Forest algorithm. The negative value of importance of a predictor indicates that omitting the predictor increases the predictive accuracy of the Random Forest algorithm.

References

    1. Alimadadi A, Aryal S, Manandhar I, Munroe PB, Joe B, Cheng X. Artificial intelligence and machine learning to fight COVID-19. Physiological Genomics. 2020;52(4):200–202. doi: 10.1152/physiolgenomics.00029. - DOI - PMC - PubMed
    1. Ashktorab H, Pizuorno A, Oskroch G, Fierro NA, Sherif ZA, Brim H. COVID-19 in latin America: symptoms, morbidities, and gastrointestinal manifestations. Gastroenterology. 2021;160:938–940. doi: 10.1053/j.gastro.2020.10.033. W.B. Saunders. - DOI - PMC - PubMed
    1. Aumpan N, Nunanan P, Vilaichone R. Gastrointestinal manifestation as clinical predictor of severe COVID-19: a retrospective experience and literature review of COVID-19 in Association of Southeast Asian Nations (ASEAN) JGH Open. 2020;4(6):1096–1101. doi: 10.1002/jgh3.12394. - DOI - PMC - PubMed
    1. Aziz M, Haghbin H, Lee-Smith W, Goyal H, Nawras A, Adler DG. Gastrointestinal predictors of severe COVID-19: systematic review and meta-analysis. Annals of Gastroenterology. 2020;33(6):615. doi: 10.20524/aog.2020.0527. - DOI - PMC - PubMed
    1. Bachtiger P, Peters NS, Walsh SL. Machine learning for COVID-19—asking the right questions. The Lancet Digital Health. 2020;2(8):e391–e392. doi: 10.1016/S2589-7500(20)30162-X. - DOI - PMC - PubMed

Publication types