. 2020 Mar 2;3(3):e201664.

doi: 10.1001/jamanetworkopen.2020.1664.

Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance

Shuja Khalid¹, Mitchell Goldenberg¹, Teodor Grantcharov¹, Babak Taati¹, Frank Rudzicz¹

Affiliations

PMID: 32227178
PMCID: PMC12124734
DOI: 10.1001/jamanetworkopen.2020.1664

Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance

Shuja Khalid et al. JAMA Netw Open. 2020.

. 2020 Mar 2;3(3):e201664.

doi: 10.1001/jamanetworkopen.2020.1664.

Authors

Shuja Khalid¹, Mitchell Goldenberg¹, Teodor Grantcharov¹, Babak Taati¹, Frank Rudzicz¹

Affiliation

¹ Surgical Safety Technologies, Toronto, Ontario, Canada.

PMID: 32227178
PMCID: PMC12124734
DOI: 10.1001/jamanetworkopen.2020.1664

Abstract

Importance: When evaluating surgeons in the operating room, experienced physicians must rely on live or recorded video to assess the surgeon's technical performance, an approach prone to subjectivity and error. Owing to the large number of surgical procedures performed daily, it is infeasible to review every procedure; therefore, there is a tremendous loss of invaluable performance data that would otherwise be useful for improving surgical safety.

Objective: To evaluate a framework for assessing surgical video clips by categorizing them based on the surgical step being performed and the level of the surgeon's competence.

Design, setting, and participants: This quality improvement study assessed 103 video clips of 8 surgeons of various levels performing knot tying, suturing, and needle passing from the Johns Hopkins University-Intuitive Surgical Gesture and Skill Assessment Working Set. Data were collected before 2015, and data analysis took place from March to July 2019.

Main outcomes and measures: Deep learning models were trained to estimate categorical outputs such as performance level (ie, novice, intermediate, and expert) and surgical actions (ie, knot tying, suturing, and needle passing). The efficacy of these models was measured using precision, recall, and model accuracy.

Results: The provided architectures achieved accuracy in surgical action and performance calculation tasks using only video input. The embedding representation had a mean (root mean square error [RMSE]) precision of 1.00 (0) for suturing, 0.99 (0.01) for knot tying, and 0.91 (0.11) for needle passing, resulting in a mean (RMSE) precision of 0.97 (0.01). Its mean (RMSE) recall was 0.94 (0.08) for suturing, 1.00 (0) for knot tying, and 0.99 (0.01) for needle passing, resulting in a mean (RMSE) recall of 0.98 (0.01). It also estimated scores on the Objected Structured Assessment of Technical Skill Global Rating Scale categories, with a mean (RMSE) precision of 0.85 (0.09) for novice level, 0.67 (0.07) for intermediate level, and 0.79 (0.12) for expert level, resulting in a mean (RMSE) precision of 0.77 (0.04). Its mean (RMSE) recall was 0.85 (0.05) for novice level, 0.69 (0.14) for intermediate level, and 0.80 (0.13) for expert level, resulting in a mean (RMSE) recall of 0.78 (0.03).

Conclusions and relevance: The proposed models and the accompanying results illustrate that deep machine learning can identify associations in surgical video clips. These are the first steps to creating a feedback mechanism for surgeons that would allow them to learn from their experiences and refine their skills.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Mr Khalid and Drs Goldenberg, Grantcharov, Taati, and Rudzicz reported having a patent pending related to measuring surgical performance using deep learning with Surgical Safety Technologies. Mr Khalid reported receiving personal fees from Surgical Safety Technologies during the conduct of the study and outside the submitted work. Dr Goldenberg reported receiving personal fees from Surgical Safety Technologies during the conduct of the study. Dr Taati reported receiving personal fees from Surgical Safety Technologies during the conduct of the study and outside the submitted work. Dr Rudzicz reported receiving salary from Surgical Safety Technologies during the conduct of the study and outside the submitted work.

Figures

**Figure 1.. Embedding Representation Analysis Architecture**
The proposed end-to-end model can be used as a classifier to predict surgical actions and self-reported skill and also as a regression model, which predicts scores on the Likert-scale for each category of the Global Rating Scale (GRS). 2-D indicates two-dimensional; and ReLU, rectified linear unit.

**Figure 2.. Predicted Segmentations Overlaid on Original Image**

See this image and copyright information in PMC

References

1. Niitsu H, Hirabayashi N, Yoshimitsu M, et al. Using the Objective Structured Assessment of Technical Skills (OSATS) global rating scale to evaluate the skills of surgical trainees in the operating room. Surg Today. 2013;43(3):271-275. doi: 10.1007/s00595-012-0313-7 - DOI - PMC - PubMed
1. Vassiliou MC, Feldman LS, Andrew CG, et al. A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg. 2005;190(1):107-113. doi: 10.1016/j.amjsurg.2005.04.004 - DOI - PubMed
1. Martin JA, Regehr G, Reznick R, et al. Objective Structured Assessment of Technical Skill (OSATS) for surgical residents. Br J Surg. 1997;84(2):273-278. doi: 10.1002/bjs.1800840237 - DOI - PubMed
1. Hatala R, Cook DA, Brydges R, Hawkins R. Constructing a validity argument for the Objective Structured Assessment of Technical Skills (OSATS): a systematic review of validity evidence. Adv Health Sci Educ Theory Pract. 2015;20(5):1149-1175. doi: 10.1007/s10459-015-9593-1 - DOI - PubMed
1. Lea C, Reiter A, Vidal R, Hager GD. Segmental spatiotemporal CNNs for fine-grained action segmentation. Accessed February 25, 2020. https://arxiv.org/abs/1602.02995

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Silverchair Information Systems
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance

Affiliation

Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical