U-NetPlus: A Modified Encoder-Decoder U-Net Architecture for Semantic and Instance Segmentation of Surgical Instruments from Laparoscopic Images

S M Kamrul Hasan, Cristian A Linte

PMID: 31947497
PMCID: PMC7372295
DOI: 10.1109/EMBC.2019.8856791

U-NetPlus: A Modified Encoder-Decoder U-Net Architecture for Semantic and Instance Segmentation of Surgical Instruments from Laparoscopic Images

S M Kamrul Hasan et al. Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul.

. 2019 Jul:2019:7205-7211.

doi: 10.1109/EMBC.2019.8856791.

Authors

S M Kamrul Hasan, Cristian A Linte

PMID: 31947497
PMCID: PMC7372295
DOI: 10.1109/EMBC.2019.8856791

Abstract

With the advent of robot-assisted surgery, there has been a paradigm shift in medical technology for minimally invasive surgery. However, it is very challenging to track the position of the surgical instruments in a surgical scene, and accurate detection & identification of surgical tools is paramount. Deep learning-based semantic segmentation in frames of surgery videos has the potential to facilitate this task. In this work, we modify the U-Net architecture by introducing a pre-trained encoder and re-design the decoder part, by replacing the transposed convolution operation with an upsampling operation based on nearest-neighbor (NN) interpolation. To further improve performance, we also employ a very fast and flexible data augmentation technique. We trained the framework on 8 × 225 frame sequences of robotic surgical videos available through the MICCAI 2017 EndoVis Challenge dataset and tested it on 8 × 75 frame and 2 × 300 frame videos. Using our U-NetPlus architecture, we report a 90.20% DICE for binary segmentation, 76.26% DICE for instrument part segmentation, and 46.07% for instrument type (i.e., all instruments) segmentation, outperforming the results of previous techniques implemented and tested on these data.

PubMed Disclaimer

Figures

**Fig. 1.**
(a) Modified U-Net with batch-normalized VGG11 as an encoder and upsampling as the decoder. Feature maps are denoted by rectangular shaped box. It consists of both an upsampling and a downsampling path and the feature map resolution is denoted by the box height, while the width represents the number of channels. Cyan arrows represent the max-pooling operation, whereas light-green arrows represent skip connections that transfer information from the encoder to the decoder. Red upward arrows represent the decoder which consists of nearest-neighbor upsampling with a scale factor of 2 followed by 2 convolution layers and a ReLU activation function; (b)-(d) working principle of nearest-neighbor interpolation where the low-resolution image is resized back to the original image.

**Fig. 2.**
Example images of applying both affine and elastic transformation in albumentations library for data augmentation.

**Fig. 3.**
Quantitative comparison of (a) training accuracy (left), (b) multi-class (class=3) instrument parts (middle) (c) multi-task segmentation accuracy (right).

**Fig. 4.**
Qualitative comparison of binary segmentation, instrument part and instrument type segmentation result and their overlay onto the native endoscopic images of the MICCAI 2017 EndoVis video dataset yielded by four different frameworks: U-Net, U-Net+NN, TernausNet, and U-NetPlus.

**Fig. 5.**
Attention results: U-NetPlus “looks” at a focused target region, whereas U-Net, U-Net+NN and TernausNet appear less “focused”, leading to less accurate segmentation.

See this image and copyright information in PMC

References

1. MICCAI 2017 Endoscopic Vision Challenge: Robotic Instrument Segmentation Sub-Challenge, 2017, https://endovissub2017-roboticinstrumentsegmentation.grand-challenge.org....
1. Buslaev EKVIIA, Parinov A and Kalinin AA, “Albumentations: fast and flexible image augmentations,” arXiv e-prints arXiv:1809.06839, 2018.
1. Chen C, Chen Q, Xu J, and Koltun V, “Learning to see in the dark,” arXiv preprint arXiv:1805.01934, 2018.
1. Dong C, Loy CC, and Tang X, “Accelerating the super-resolution convolutional neural network,” in European Conference on Computer Vision Springer, 2016, pp. 391–407.
1. Fong RC and Vedaldi A, “Interpretable explanations of black boxes by meaningful perturbation,” arXiv preprint arXiv:1704.03296, 2017.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R35 GM128877/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

U-NetPlus: A Modified Encoder-Decoder U-Net Architecture for Semantic and Instance Segmentation of Surgical Instruments from Laparoscopic Images

U-NetPlus: A Modified Encoder-Decoder U-Net Architecture for Semantic and Instance Segmentation of Surgical Instruments from Laparoscopic Images

Authors

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources