Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May;4(2):331-334.
doi: 10.1109/TMRB.2022.3170206.

Self-supervised Monocular Depth Estimation with 3D Displacement Module for Laparoscopic Images

Affiliations

Self-supervised Monocular Depth Estimation with 3D Displacement Module for Laparoscopic Images

Chi Xu et al. IEEE Trans Med Robot Bionics. 2022 May.

Abstract

We present a novel self-supervised training framework with 3D displacement (3DD) module for accurately estimating per-pixel depth maps from single laparoscopic images. Recently, several self-supervised learning based monocular depth estimation models have achieved good results on the KITTI dataset, under the hypothesis that the camera is dynamic and the objects are stationary, however this hypothesis is often reversed in the surgical setting (laparoscope is stationary, the surgical instruments and tissues are dynamic). Therefore, a 3DD module is proposed to establish the relation between frames instead of ego-motion estimation. In the 3DD module, a convolutional neural network (CNN) analyses source and target frames to predict the 3D displacement of a 3D point cloud from a target frame to a source frame in the coordinates of the camera. Since it is difficult to constrain the depth displacement from two 2D images, a novel depth consistency module is proposed to maintain depth consistency between displacement-updated depth and model-estimated depth to constrain 3D displacement effectively. Our proposed method achieves remarkable performance for monocular depth estimation on the Hamlyn surgical dataset and acquired ground truth depth maps, outperforming monodepth, monodepth2 and packnet models.

Keywords: 3D displacement; CNN; Deep learning; monocular depth estimation; self-supervised learning.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Framework architecture.
The Resnet 18 [15] is pre-trained. The dark blue arrow indicates bilinear interpolation from multi-scale outputs to original scaled outputs. The colored lines are used to indicate correspondence between output data and loss function (red for lap, blue for ld, green for ls).
Fig. 2
Fig. 2. The 3DD module architecture.
The orange and purple lines represent the inputs and outputs respectively.
Fig. 3
Fig. 3
Qualitative result comparison between our method, packnet [12], monodepth2 [11], monodepth [13]. The first column contains example test images. The other columns are the corresponding disparity maps.
Fig. 4
Fig. 4
The acquired ground truth depth maps via da Vinci (Intuitive Inc.) stereo laparoscope and projected gray-code structured light pattern [20].
Fig. 5
Fig. 5. The effect of view-field masking is shown in red boxes.

References

    1. Zhang K. Minimally invasive surgery. Endoscopy. 2002 - PubMed
    1. Zhang V, Melis M, Amato B, Bianco T, Rocca A, Amato M, Quarto G, Benassai G. Minimally invasive radioguided parathy- roid surgery: A literature review. IJS. 2016 - PubMed
    1. Westebring-van der Putten EP, Goossens RH, Jakimowicz JJ, Dankelman J. Haptics in minimally invasive surgery–a review. Minimally Invasive Therapy & Allied Technologies. 2008 - PubMed
    1. Zhang L, Li X, Yang S, Ding S, Jolfaei A, Zheng X. Unsupervised learning-based continuous depth and motion estimation with monocular endoscopy for virtual reality minimally invasive surgery. TII. 2020
    1. Zhang S, Sinha A, Reiter A, Ishii M, Gallia GL, Taylor RH, Hager GD. Evaluation and stability analysis of video-based navigation system for functional endoscopic sinus surgery on in vivo clinical data. TMI. 2018 - PubMed

LinkOut - more resources