Do segmentation metrics reflect clinical reality? A surgeon-centered evaluation in robot-assisted minimally invasive esophagectomy
- PMID: 41073813
- DOI: 10.1007/s00464-025-12266-3
Do segmentation metrics reflect clinical reality? A surgeon-centered evaluation in robot-assisted minimally invasive esophagectomy
Abstract
Background: Deep learning-based anatomy segmentation holds promise for improving real-time guidance in complex surgeries such as robot-assisted minimally invasive esophagectomy (RAMIE). However, the clinical relevance of commonly used metrics for evaluating segmentation quality remains unclear, as previous assessments have lacked direct input from surgeons. This study aims to assess how well quantitative segmentation metrics reflect surgeons' assessments of anatomical overlay accuracy and clinical usefulness during RAMIE.
Methods: We conducted a survey involving 26 upper gastrointestinal surgeons, including both trainee and attending surgeons, who assessed video clips of RAMIE procedures featuring deep learning-generated anatomical overlays. We correlated the surgeons' qualitative evaluations of annotation accuracy and clinical usefulness with a comprehensive set of quantitative metrics, including overlap, distance, temporal, and error-specific measures. The analysis encompassed over 8000 manually annotated frames from 12 video clips, with overlays generated by two state-of-the-art deep learning models.
Results: Overlap and temporal consistency metrics show the strongest correlation with surgeon assessments. Distance-based and error-specific metrics correlate moderately. Novices show weaker correlations and tend to rate overlays more leniently. Qualitative feedback reveals issues like hallucinations and instability, often missed by current metrics.
Conclusion: Standard quantitative metrics partially reflect surgeon perceptions but should be complemented by surgeon-informed evaluations and task-specific metrics to better capture clinically relevant errors. Aligning metric design with surgical expertise is essential for the safe and effective integration of AI-guided anatomical segmentation in the operating room.
Keywords: Anatomy recognition; Deep learning; Evaluation metrics; Robot-assisted surgery; Semantic segmentation; Survey.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Disclosures: Ronald de Jong, Gino Kuiper, and Josien Pluim received funding from the Dutch Research Council (NWO), study number KICH1.ST03.21.019, including in-kind contributions from Asensus and Rods & Cones. Yiping Li and Yasmina Al Khalil received funding from Stichting Hanarth Fonds, study number 2022-13. Romy van Jaarsveld received funding from the Dutch Research Council (NWO), study number OCENW.M.21.377. Jelle Ruurda received funding from Stichting Hanarth Fonds, study number 2022-13, and the Dutch Research Council (NWO), study number KICH1.ST03.21.019, including in-kind contributions from Asensus and Rods & Cones. Jelle Ruurda also has a consulting or advisory role for Intuitive Inc. and Medtronic. Richard van Hillegersberg has a consulting or advisory role for Intuitive Inc., Medtronic, and Olympus. Marcel Breeuwer has no conflicts of interest or financial ties to disclose.
References
-
- Den Boer R, Jongh C, Huijbers W, Jaspers T, Pluim J, Hillegersberg R, Van Eijnatten M, Ruurda J (2022) Computer-aided anatomy recognition in intrathoracic and-abdominal surgery: a systematic review. Surg Endosc 36(12):8737–8752. https://doi.org/10.1007/s00464-022-09421-5 - DOI
-
- Jaspers TJM, Jong RLPD, Li Y, Kusters CHJ, Bakker FHA, Jaarsveld RC, Kuiper GM, Hillegersberg R, Ruurda JP, Brinkman WM, Pluim JPW, With PHN, Breeuwer M, Khalil YA, Sommen F (2025) Scaling up self-supervised learning for improved surgical foundation models. arXiv:2501.09436
-
- Ruurda J, Van Der Sluis P, Van Der Horst S, Van Hilllegersberg R (2015) Robot-assisted minimally invasive esophagectomy for esophageal cancer: a systematic review. J Surg Oncol 112(3):257–265. https://doi.org/10.1002/jso.23922 - DOI - PubMed
-
- Pickering OJ, Van Boxel GI, Carter NC, Mercer SJ, Knight BC, Pucher PH (2023) Learning curve for adoption of robot-assisted minimally invasive esophagectomy: a systematic review of oncological, clinical, and efficiency outcomes. Dis Esophagus 36(6):089. https://doi.org/10.1093/dote/doac089 - DOI
-
- Zhang H, Chen L, Wang Z, Zheng Y, Geng Y, Wang F, Liu D, He A, Ma L, Yuan Y, Wang Y (2018) The learning curve for robotic McKeown esophagectomy in patients with esophageal cancer. Ann Thorac Surg 105(4):1024–1030. https://doi.org/10.1016/j.athoracsur.2017.11.058 - DOI - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources