. 2025 Jul 10;8(1):424.

doi: 10.1038/s41746-025-01826-5.

How machine learning on real world clinical data improves adverse event recording for endoscopy

Stefan Wittlinger^#¹, Isabella C Wiest^#^{1

2}, Mahboubeh Jannesari Ladani³, Jakob Nikolas Kather^{2

4

5}, Matthias P Ebert^{1

6

7}, Fabian Siegel³, Sebastian Belle⁸

Affiliations

¹ Department of Medicine II, University Medical Center Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
² Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
³ Department of Biomedical Informatics, Mannheim Institute for intelligent Systems in Medicine (MIISM), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
⁴ Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
⁵ Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany, Heidelberg, Germany.
⁶ DKFZ Hector Cancer Institute at the University Medical Center, Mannheim, Germany.
⁷ Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
⁸ Department of Medicine II, University Medical Center Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany. Sebastian.belle@umm.de.

^# Contributed equally.

PMID: 40640575
PMCID: PMC12246240
DOI: 10.1038/s41746-025-01826-5

How machine learning on real world clinical data improves adverse event recording for endoscopy

Stefan Wittlinger et al. NPJ Digit Med. 2025.

. 2025 Jul 10;8(1):424.

doi: 10.1038/s41746-025-01826-5.

Authors

Stefan Wittlinger^#¹, Isabella C Wiest^#^{1

2}, Mahboubeh Jannesari Ladani³, Jakob Nikolas Kather^{2

4

5}, Matthias P Ebert^{1

6

7}, Fabian Siegel³, Sebastian Belle⁸

Affiliations

¹ Department of Medicine II, University Medical Center Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
² Else Kroener Fresenius Center for Digital Health, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
³ Department of Biomedical Informatics, Mannheim Institute for intelligent Systems in Medicine (MIISM), Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany.
⁴ Department of Medicine I, Faculty of Medicine and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany.
⁵ Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany, Heidelberg, Germany.
⁶ DKFZ Hector Cancer Institute at the University Medical Center, Mannheim, Germany.
⁷ Molecular Medicine Partnership Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
⁸ Department of Medicine II, University Medical Center Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany. Sebastian.belle@umm.de.

^# Contributed equally.

PMID: 40640575
PMCID: PMC12246240
DOI: 10.1038/s41746-025-01826-5

Abstract

Endoscopic interventions are essential for diagnosing and treating gastrointestinal conditions. Accurate and comprehensive documentation is crucial for enhancing patient safety and optimizing clinical outcomes; however, adverse events remain underreported. This study evaluates a machine learning-based approach for systematically detecting endoscopic adverse events from real-world clinical metadata, including structured hospital data such as ICD-codes and procedure timings. Using a random forest classifier detecting adverse events perforation, bleeding, and readmission, we analysed 2490 inpatient cases, achieving significant improvements over baseline prediction accuracy. The model achieved AUC-ROC/AUC-PR values of 0.9/0.69 for perforation, 0.84/0.64 for bleeding, and 0.96/0.9 for readmissions. Results highlight the importance of multiple metadata features for robust predictions. This semi-automated method offers a privacy-preserving tool for identifying documentation discrepancies and enhancing quality control. By integrating metadata analysis, this approach supports better clinical decision-making, quality improvement initiatives, and resource allocation while reducing the risk of missed adverse events in endoscopy.

PubMed Disclaimer

Conflict of interest statement

Competing interests: S.B. declares consulting services for Olympus. I.W. received honoraria from AstraZeneca. J.K. declares consulting services for Bioptimus, France; Panakeia, UK; AstraZeneca, UK; and MultiplexDx, Slovakia. Furthermore, he holds shares in StratifAI, Germany, Synagen, Germany, Ignition Lab, Germany; has received an institutional research grant by G.S.K.; and has received honoraria by AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer, and Fresenius. All other authors declare no competing interests.

Figures

**Fig. 1. Example of data generated during a hospital stay.**
This figure displays an example of data generated during a hospital stay, which includes both unstructured data, primarily in the form of text (e.g., endoscopy reports and discharge letters), and structured data (metadata), such as diagnoses, materials used during endoscopy, and time until discharge. For a comprehensive list of the metadata used, refer to Supplementary Tables 2–3.

**Fig. 2. Training and testing scheme for adverse events perforation, and bleeding.**
For adverse events and perforation adverse events, the scheme for training and testing is displayed. For this purpose a combination of LLM-generated and manually generated labels was used. The random forest was trained for two types of adverse events, perforation and bleeding using a training set with n = 1990 cases. The labels for the training set were obtained by running a large language model on the endoscopy reports and discharge letters. The performance metrics were obtained by testing on the remaining n = 500 manually labeled cases representing the ground truth. To estimate the stability of the machine learning algorithm, the large language model labels were used for the entire data set (n = 2490). With these, we performed random subsampling with 100 iterations. In each iteration, the data was randomly split into training (n = 1990) and test set (n = 500). From this, the standard deviation of the performance metrics was calculated. Perforation or bleeding that occurred after readmission was not classified as adverse events, perforation or bleeding, but rather as adverse event readmission. The listed data is available at discharge, allowing the detection of adverse events such as bleeding or perforation to be performed at discharge or any later time.

**Fig. 3. Training and testing scheme for adverse events of readmission.**
Training and testing scheme for adverse event readmission within 30 days due to adverse events in connection with previously performed EMR. The entire data set, n = 213, consisting of all readmissions within 30 days was manually labeled. Given the limited sample size, the metadata used was restricted to the time until readmission and the ICD codes recorded at readmission. The random forest classifier was trained on n = 163 cases and tested on n = 50 cases. To evaluate the stability of the machine learning algorithm, random subsampling was performed over 100 iterations, with different splits between training and testing sets in each iteration. The listed data is available at readmission, allowing the detection of adverse event readmission to be performed at readmission or any later time.

**Fig. 4. Test results for adverse event readmission.**
Test results (AUC-ROC and AUC-PR) and errors for adverse event readmission within 30 days due to adverse events in connection with previously performed EMR are displayed. The dataset (n = 213) with manually labeled data was randomly split into a training set (n = 163) and a testing set (n = 50). This random subsampling process was repeated 100 times. The AUC-ROC and AUC-PR values were calculated as the mean across all runs, with error bars representing the standard deviation.

**Fig. 5. Test results for adverse events bleeding and perforation.**
a The test results for adverse events bleeding and perforation (AUC-ROC and AUC-PR) are displayed. The model was trained on a training set (n = 1990) with labels generated by a large language model and tested on a manually labeled test set (n = 500). Direct error bars cannot be computed for this process, as random subsampling would require manual labels for all cases. b Estimated error values using only labels generated by a large language model are shown. Labels generated by a large language model are used for both training (n = 1990) and testing (n = 500). This process is repeated over 100 iterations using random subsampling, with a different split of training and test data in each iteration. Performance metrics (AUC-ROC, AUC-PR, and dummy classifier) are calculated as mean values, with the error bars representing the standard deviations shown in the plot.

**Fig. 6. Ten most important features for adverse events perforation, bleeding, and readmission.**
The 10 most important features for a perforation b bleeding and c readmission are displayed. SHAP was used to determine feature importance.

See this image and copyright information in PMC

References

1. Kavic, S. M. & Basson, M. D. Complications of endoscopy. Am. J. Surg.181, 319–332 (2001). - PubMed
1. Mergener, K. Defining and measuring endoscopic complications: more questions than answers. Gastrointest. Endosc. Clin. N. Am.17, 1–9 (2007). - PubMed
1. Adler, A. et al. Data quality of the German screening colonoscopy registry. Endoscopy45, 813–818 (2013). - PubMed
1. Esteva, A. et al. Deep learning-enabled medical computer vision. NPJ Digit Med.4, 5 (2021). - PMC - PubMed
1. Harerimana, G., Kim, J. W., Yoo, H. & Jang, B. Deep learning for electronic health records analytics. IEEE Access7, 101245–101259 (2019).

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

How machine learning on real world clinical data improves adverse event recording for endoscopy

Affiliations

How machine learning on real world clinical data improves adverse event recording for endoscopy

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources

Research Materials