Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 2;3(3):e201664.
doi: 10.1001/jamanetworkopen.2020.1664.

Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance

Affiliations

Evaluation of Deep Learning Models for Identifying Surgical Actions and Measuring Performance

Shuja Khalid et al. JAMA Netw Open. .

Abstract

Importance: When evaluating surgeons in the operating room, experienced physicians must rely on live or recorded video to assess the surgeon's technical performance, an approach prone to subjectivity and error. Owing to the large number of surgical procedures performed daily, it is infeasible to review every procedure; therefore, there is a tremendous loss of invaluable performance data that would otherwise be useful for improving surgical safety.

Objective: To evaluate a framework for assessing surgical video clips by categorizing them based on the surgical step being performed and the level of the surgeon's competence.

Design, setting, and participants: This quality improvement study assessed 103 video clips of 8 surgeons of various levels performing knot tying, suturing, and needle passing from the Johns Hopkins University-Intuitive Surgical Gesture and Skill Assessment Working Set. Data were collected before 2015, and data analysis took place from March to July 2019.

Main outcomes and measures: Deep learning models were trained to estimate categorical outputs such as performance level (ie, novice, intermediate, and expert) and surgical actions (ie, knot tying, suturing, and needle passing). The efficacy of these models was measured using precision, recall, and model accuracy.

Results: The provided architectures achieved accuracy in surgical action and performance calculation tasks using only video input. The embedding representation had a mean (root mean square error [RMSE]) precision of 1.00 (0) for suturing, 0.99 (0.01) for knot tying, and 0.91 (0.11) for needle passing, resulting in a mean (RMSE) precision of 0.97 (0.01). Its mean (RMSE) recall was 0.94 (0.08) for suturing, 1.00 (0) for knot tying, and 0.99 (0.01) for needle passing, resulting in a mean (RMSE) recall of 0.98 (0.01). It also estimated scores on the Objected Structured Assessment of Technical Skill Global Rating Scale categories, with a mean (RMSE) precision of 0.85 (0.09) for novice level, 0.67 (0.07) for intermediate level, and 0.79 (0.12) for expert level, resulting in a mean (RMSE) precision of 0.77 (0.04). Its mean (RMSE) recall was 0.85 (0.05) for novice level, 0.69 (0.14) for intermediate level, and 0.80 (0.13) for expert level, resulting in a mean (RMSE) recall of 0.78 (0.03).

Conclusions and relevance: The proposed models and the accompanying results illustrate that deep machine learning can identify associations in surgical video clips. These are the first steps to creating a feedback mechanism for surgeons that would allow them to learn from their experiences and refine their skills.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Mr Khalid and Drs Goldenberg, Grantcharov, Taati, and Rudzicz reported having a patent pending related to measuring surgical performance using deep learning with Surgical Safety Technologies. Mr Khalid reported receiving personal fees from Surgical Safety Technologies during the conduct of the study and outside the submitted work. Dr Goldenberg reported receiving personal fees from Surgical Safety Technologies during the conduct of the study. Dr Taati reported receiving personal fees from Surgical Safety Technologies during the conduct of the study and outside the submitted work. Dr Rudzicz reported receiving salary from Surgical Safety Technologies during the conduct of the study and outside the submitted work.

Figures

Figure 1.
Figure 1.. Embedding Representation Analysis Architecture
The proposed end-to-end model can be used as a classifier to predict surgical actions and self-reported skill and also as a regression model, which predicts scores on the Likert-scale for each category of the Global Rating Scale (GRS). 2-D indicates two-dimensional; and ReLU, rectified linear unit.
Figure 2.
Figure 2.. Predicted Segmentations Overlaid on Original Image

References

    1. Niitsu H, Hirabayashi N, Yoshimitsu M, et al. Using the Objective Structured Assessment of Technical Skills (OSATS) global rating scale to evaluate the skills of surgical trainees in the operating room. Surg Today. 2013;43(3):271-275. doi: 10.1007/s00595-012-0313-7 - DOI - PMC - PubMed
    1. Vassiliou MC, Feldman LS, Andrew CG, et al. A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg. 2005;190(1):107-113. doi: 10.1016/j.amjsurg.2005.04.004 - DOI - PubMed
    1. Martin JA, Regehr G, Reznick R, et al. Objective Structured Assessment of Technical Skill (OSATS) for surgical residents. Br J Surg. 1997;84(2):273-278. doi: 10.1002/bjs.1800840237 - DOI - PubMed
    1. Hatala R, Cook DA, Brydges R, Hawkins R. Constructing a validity argument for the Objective Structured Assessment of Technical Skills (OSATS): a systematic review of validity evidence. Adv Health Sci Educ Theory Pract. 2015;20(5):1149-1175. doi: 10.1007/s10459-015-9593-1 - DOI - PubMed
    1. Lea C, Reiter A, Vidal R, Hager GD. Segmental spatiotemporal CNNs for fine-grained action segmentation. Accessed February 25, 2020. https://arxiv.org/abs/1602.02995

Publication types

MeSH terms