Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul:2019:7205-7211.
doi: 10.1109/EMBC.2019.8856791.

U-NetPlus: A Modified Encoder-Decoder U-Net Architecture for Semantic and Instance Segmentation of Surgical Instruments from Laparoscopic Images

U-NetPlus: A Modified Encoder-Decoder U-Net Architecture for Semantic and Instance Segmentation of Surgical Instruments from Laparoscopic Images

S M Kamrul Hasan et al. Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul.

Abstract

With the advent of robot-assisted surgery, there has been a paradigm shift in medical technology for minimally invasive surgery. However, it is very challenging to track the position of the surgical instruments in a surgical scene, and accurate detection & identification of surgical tools is paramount. Deep learning-based semantic segmentation in frames of surgery videos has the potential to facilitate this task. In this work, we modify the U-Net architecture by introducing a pre-trained encoder and re-design the decoder part, by replacing the transposed convolution operation with an upsampling operation based on nearest-neighbor (NN) interpolation. To further improve performance, we also employ a very fast and flexible data augmentation technique. We trained the framework on 8 × 225 frame sequences of robotic surgical videos available through the MICCAI 2017 EndoVis Challenge dataset and tested it on 8 × 75 frame and 2 × 300 frame videos. Using our U-NetPlus architecture, we report a 90.20% DICE for binary segmentation, 76.26% DICE for instrument part segmentation, and 46.07% for instrument type (i.e., all instruments) segmentation, outperforming the results of previous techniques implemented and tested on these data.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(a) Modified U-Net with batch-normalized VGG11 as an encoder and upsampling as the decoder. Feature maps are denoted by rectangular shaped box. It consists of both an upsampling and a downsampling path and the feature map resolution is denoted by the box height, while the width represents the number of channels. Cyan arrows represent the max-pooling operation, whereas light-green arrows represent skip connections that transfer information from the encoder to the decoder. Red upward arrows represent the decoder which consists of nearest-neighbor upsampling with a scale factor of 2 followed by 2 convolution layers and a ReLU activation function; (b)-(d) working principle of nearest-neighbor interpolation where the low-resolution image is resized back to the original image.
Fig. 2.
Fig. 2.
Example images of applying both affine and elastic transformation in albumentations library for data augmentation.
Fig. 3.
Fig. 3.
Quantitative comparison of (a) training accuracy (left), (b) multi-class (class=3) instrument parts (middle) (c) multi-task segmentation accuracy (right).
Fig. 4.
Fig. 4.
Qualitative comparison of binary segmentation, instrument part and instrument type segmentation result and their overlay onto the native endoscopic images of the MICCAI 2017 EndoVis video dataset yielded by four different frameworks: U-Net, U-Net+NN, TernausNet, and U-NetPlus.
Fig. 5.
Fig. 5.
Attention results: U-NetPlus “looks” at a focused target region, whereas U-Net, U-Net+NN and TernausNet appear less “focused”, leading to less accurate segmentation.

References

    1. MICCAI 2017 Endoscopic Vision Challenge: Robotic Instrument Segmentation Sub-Challenge, 2017, https://endovissub2017-roboticinstrumentsegmentation.grand-challenge.org....
    1. Buslaev EKVIIA, Parinov A and Kalinin AA, “Albumentations: fast and flexible image augmentations,” arXiv e-prints arXiv:1809.06839, 2018.
    1. Chen C, Chen Q, Xu J, and Koltun V, “Learning to see in the dark,” arXiv preprint arXiv:1805.01934, 2018.
    1. Dong C, Loy CC, and Tang X, “Accelerating the super-resolution convolutional neural network,” in European Conference on Computer Vision Springer, 2016, pp. 391–407.
    1. Fong RC and Vedaldi A, “Interpretable explanations of black boxes by meaningful perturbation,” arXiv preprint arXiv:1704.03296, 2017.

Publication types

LinkOut - more resources