Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug;17(8):1477-1486.
doi: 10.1007/s11548-022-02637-9. Epub 2022 May 27.

Robust hand tracking for surgical telestration

Affiliations

Robust hand tracking for surgical telestration

Lucas-Raphael Müller et al. Int J Comput Assist Radiol Surg. 2022 Aug.

Erratum in

  • Correction to: Robust hand tracking for surgical telestration.
    Müller LR, Petersen J, Yamlahi A, Wise P, Adler TJ, Seitel A, Kowalewski KF, Müller B, Kenngott H, Nickel F, Maier-Hein L. Müller LR, et al. Int J Comput Assist Radiol Surg. 2022 Aug;17(8):1487. doi: 10.1007/s11548-022-02702-3. Int J Comput Assist Radiol Surg. 2022. PMID: 35802224 Free PMC article. No abstract available.

Abstract

Purpose: As human failure has been shown to be one primary cause for post-operative death, surgical training is of the utmost socioeconomic importance. In this context, the concept of surgical telestration has been introduced to enable experienced surgeons to efficiently and effectively mentor trainees in an intuitive way. While previous approaches to telestration have concentrated on overlaying drawings on surgical videos, we explore the augmented reality (AR) visualization of surgical hands to imitate the direct interaction with the situs.

Methods: We present a real-time hand tracking pipeline specifically designed for the application of surgical telestration. It comprises three modules, dedicated to (1) the coarse localization of the expert's hand and the subsequent (2) segmentation of the hand for AR visualization in the field of view of the trainee and (3) regression of keypoints making up the hand's skeleton. The semantic representation is obtained to offer the ability for structured reporting of the motions performed as part of the teaching.

Results: According to a comprehensive validation based on a large data set comprising more than 14,000 annotated images with varying application-relevant conditions, our algorithm enables real-time hand tracking and is sufficiently accurate for the task of surgical telestration. In a retrospective validation study, a mean detection accuracy of 98%, a mean keypoint regression accuracy of 10.0 px and a mean Dice Similarity Coefficient of 0.95 were achieved. In a prospective validation study, it showed uncompromised performance when the sensor, operator or gesture varied.

Conclusion: Due to its high accuracy and fast inference time, our neural network-based approach to hand tracking is well suited for an AR approach to surgical telestration. Future work should be directed to evaluating the clinical value of the approach.

Keywords: Computer vision; Deep learning; Hand tracking; Surgical data science; Telestration.

PubMed Disclaimer

Conflict of interest statement

LRM, JP, AY, PW, TJA, AS, KFK, BM, HK, FN, LMH do not declare conflicts of interest.

Figures

Fig. 1
Fig. 1
Our telestration approach compared to the state of the art. a Previous approaches to surgical telestration rely on overlaying drawings on laparoscopic videos, while our concept is based on b the augmented reality (AR) visualization of the expert surgeon’s hand
Fig. 2
Fig. 2
Concept overview. Our approach to surgical telestration relies on a camera that continuously captures a hand of the mentor who observes the operation either on-site or remotely. The camera data are processed by a two-stage neural network, which outputs both the skeleton (represented by 21 keypoints) and the segmented hand. The hand segmentation is overlaid on the surgical screen for intuitive coaching, while the skeleton representation is stored for long-term analysis
Fig. 3
Fig. 3
Overview of the models used for real-time hand tracking. Our approach comprises three core components. (1) a bounding box module using the YOLOv5s architecture, (2) a skeleton tracking module using an EfficientNet B3 and (3) a segmentation module using a FPN-EfficientNet B1. (2) and (3) operate on images cropped to the respective bounding boxes (see “Real-time hand localization”, “Real-time skeleton tracking”, “Real-time hand segmentation” section)
Fig. 4
Fig. 4
Representative results for a diverse set of gestures. The outputs of the three models for bounding box prediction (top) as well as skeleton tracking and segmentation (bottom) are shown
Fig. 5
Fig. 5
Skeleton tracking performance for our method (orange) vs. MediaPipe (blue) as the baseline. Fraction of successful localizations (left) and mean regression distance (right) for successful localizations and validated with respect to the different hand properties. Note that for MediaPipe there are only very few successful localizations for blue gloves and none for green ones
Fig. 6
Fig. 6
Representative failure cases of the skeleton extraction model (top row) and the segmentation model (bottom row)
Fig. 7
Fig. 7
Results of the prospective validation study. Skeleton tracking performance (upper row), quantified by mean regression distance and hand tracking performance (lower row), quantified by the dice similarity coefficient (DSC) are shown for the camera used in the training data set (D435i) as well as a previously unseen camera (L515). Each color corresponds to a different mentor. No notable differences were obtained for the different gestures

References

    1. Nepogodiev D, Martin J, Biccard B, Makupe A, Bhangu A, Ademuyiwa A, Adisa AO, Aguilera ML, Chakrabortee S, Fitzgerald JE, Ghosh D, Glasbey JC, Harrison EM, Ingabire JCA, Salem H, Lapitan MC, Lawani I, Lissauer D, Magill L, Moore R, Osei-Bordom DC, Pinkney TD, Qureshi AU, Ramos-De la Medina A, Rayne S, Sundar S, Tabiri S, Verjee A, Yepez R, Garden OJ, Lilford R, Brocklehurst P, Morton DG, Bhangu A (2019) Lobal burden of postoperative death. Lance. 10.1016/S0140-6736(18)33139-8
    1. Nickel F, Cizmic A, Chand M. Telestration and augmented reality in minimally invasive surgery: an invaluable tool in the age of covid-19 for remote proctoring and telementoring. JAMA Surg. 2021 doi: 10.1001/jamasurg.2021.3604. - DOI - PubMed
    1. Luck J, Hachach-Haram N, Greenfield M, Smith O, Billingsley M, Heyes R, Mosahebi A, Greenfield MJ. ugmented reality in undergraduate surgical training: the PROXIMIE pilot. Int J Surg. 2017 doi: 10.1016/j.ijsu.2017.08.029. - DOI
    1. Jarc AM, Stanley AA, Clifford T, Gill IS, Hung AJ. Proctors exploit three-dimensional ghost tools during clinical-like training scenarios: a preliminary study. World J Urol. 2017 doi: 10.1007/s00345-016-1944-x. - DOI - PMC - PubMed
    1. Erridge S, Yeung DKT, Patel HRH, Purkayastha S. Telementoring of surgeons: a systematic review. Surg Innov. 2019 doi: 10.1177/1553350618813250. - DOI - PubMed