Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul;18(7):1135-1142.
doi: 10.1007/s11548-023-02918-x. Epub 2023 May 9.

Investigating keypoint descriptors for camera relocalization in endoscopy surgery

Affiliations

Investigating keypoint descriptors for camera relocalization in endoscopy surgery

Isabela Hernández et al. Int J Comput Assist Radiol Surg. 2023 Jul.

Abstract

Purpose: Recent advances in computer vision and machine learning have resulted in endoscopic video-based solutions for dense reconstruction of the anatomy. To effectively use these systems in surgical navigation, a reliable image-based technique is required to constantly track the endoscopic camera's position within the anatomy, despite frequent removal and re-insertion. In this work, we investigate the use of recent learning-based keypoint descriptors for six degree-of-freedom camera pose estimation in intraoperative endoscopic sequences and under changes in anatomy due to surgical resection.

Methods: Our method employs a dense structure from motion (SfM) reconstruction of the preoperative anatomy, obtained with a state-of-the-art patient-specific learning-based descriptor. During the reconstruction step, each estimated 3D point is associated with a descriptor. This information is employed in the intraoperative sequences to establish 2D-3D correspondences for Perspective-n-Point (PnP) camera pose estimation. We evaluate this method in six intraoperative sequences that include anatomical modifications obtained from two cadaveric subjects.

Results: Show that this approach led to translation and rotation errors of 3.9 mm and 0.2 radians, respectively, with 21.86% of localized cameras averaged over the six sequences. In comparison to an additional learning-based descriptor (HardNet++), the selected descriptor can achieve a better percentage of localized cameras with similar pose estimation performance. We further discussed potential error causes and limitations of the proposed approach.

Conclusion: Patient-specific learning-based descriptors can relocalize images that are well distributed across the inspected anatomy, even where the anatomy is modified. However, camera relocalization in endoscopic sequences remains a persistently challenging problem, and future research is necessary to increase the robustness and accuracy of this technique.

Keywords: Anatomical landmark recognition; Camera relocalization; Learning-based descriptors; Sinus surgery navigation.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest/Competing interests: The authors have no competing interests to declare that are relevant to the content of this article.

Figures

Fig. 1
Fig. 1. Endoscope relocalization pipeline.
The relocalization process comprises three main stages: a) Using preoperative data, we generate a dense point cloud C of the anatomy using a learning-based descriptor extraction and matching process [3] along with SfM. We relate every point in C with a numerical descriptor (depicted as brown blocks). b) The relocalization of a query image Iq occurs by extracting its corresponding descriptor representation, and finding matches with the point cloud descriptors to generate a query 2D-3D correspondence set S. c) The set S is used by a standard PnP solver that integrates RANSAC for correspondence outlier rejection. Best viewed in color.
Fig. 2
Fig. 2
Spatial distribution of valid cameras (red) w.r.t its ground-truth trajectory (blue) and 3D anatomical model. (a) and (b) correspond to sequences of Undisturbed Anatomy and Progression Step # 1 of Subject # 1, respectively. A comparative frame of the operated region is also presented. Best viewed in color.

References

    1. Mirota DJ, Masaru I, Hager GD: Vision-Based Navigation in Image-Guided Interventions. Annu Rev Biomed Eng (2011) - PubMed
    1. Yeung BPM, Gourlay T: A technical review of flexible endoscopic multitasking platforms. International Journal of Surgery (2012) - PubMed
    1. Liu X, Zheng Y, Killeen B, Ishii M, Hager GD, Taylor RH, Unberath M: Extremely Dense Point Correspondences using a Learned Feature Descriptor. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4847–4856 (2020)
    1. Liu X, Stiber M, Huang J, Ishii M, Hager GD, Taylor RH, Unberath M: Reconstructing Sinus Anatomy from Endoscopic Video – Towards a Radiation-Free Approach for Quantitative Longitudinal Assessment. In: Martel AL, Abolmaesumi P, Stoyanov D, Mateus D, Zuluaga MA, Zhou SK, Racoceanu D, Joskowicz L (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2020, pp. 3–13. Springer, Cham: (2020)
    1. Liu X, Li Z, Ishii M, Hager GD, Taylor RH, Unberath M: SAGE: SLAM with Appearance and Geometry Prior for Endoscopy. In: ICRA; (2022) - PMC - PubMed