Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May;15(5):791-801.
doi: 10.1007/s11548-020-02169-0. Epub 2020 Apr 29.

FetNet: a recurrent convolutional network for occlusion identification in fetoscopic videos

Affiliations

FetNet: a recurrent convolutional network for occlusion identification in fetoscopic videos

Sophia Bano et al. Int J Comput Assist Radiol Surg. 2020 May.

Abstract

Purpose: Fetoscopic laser photocoagulation is a minimally invasive surgery for the treatment of twin-to-twin transfusion syndrome (TTTS). By using a lens/fibre-optic scope, inserted into the amniotic cavity, the abnormal placental vascular anastomoses are identified and ablated to regulate blood flow to both fetuses. Limited field-of-view, occlusions due to fetus presence and low visibility make it difficult to identify all vascular anastomoses. Automatic computer-assisted techniques may provide better understanding of the anatomical structure during surgery for risk-free laser photocoagulation and may facilitate in improving mosaics from fetoscopic videos.

Methods: We propose FetNet, a combined convolutional neural network (CNN) and long short-term memory (LSTM) recurrent neural network architecture for the spatio-temporal identification of fetoscopic events. We adapt an existing CNN architecture for spatial feature extraction and integrated it with the LSTM network for end-to-end spatio-temporal inference. We introduce differential learning rates during the model training to effectively utilising the pre-trained CNN weights. This may support computer-assisted interventions (CAI) during fetoscopic laser photocoagulation.

Results: We perform quantitative evaluation of our method using 7 in vivo fetoscopic videos captured from different human TTTS cases. The total duration of these videos was 5551 s (138,780 frames). To test the robustness of the proposed approach, we perform 7-fold cross-validation where each video is treated as a hold-out or test set and training is performed using the remaining videos.

Conclusion: FetNet achieved superior performance compared to the existing CNN-based methods and provided improved inference because of the spatio-temporal information modelling. Online testing of FetNet, using a Tesla V100-DGXS-32GB GPU, achieved a frame rate of 114 fps. These results show that our method could potentially provide a real-time solution for CAI and automating occlusion and photocoagulation identification during fetoscopic procedures.

Keywords: Computer assisted interventions (CAI); Deep learning; Fetoscopy; Surgical vision; Twin-to-twin transfusion syndrome (TTTS); Video segmentation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1
Representative cropped images from the seven fetoscopic videos used in our experiments displaying the four multi-label event classes
Fig. 2
Fig. 2
An overview of the proposed FetNet for event classification in fetoscopic videos. Spatial representation of each frame is encoded by a CNN (VGG16 architecture) while the temporal representation is encoded using LSTM followed by fully connected layers. Differential learning rate is applied during network training
Fig. 3
Fig. 3
Precision-recall curves along with AUCs of different methods under comparison for a clear view, b occlusion, c tool and d ablation classes. e The micro-average precision-recall over all the classes
Fig. 4
Fig. 4
Performance comparison of different methods. F1-scores and standard deviations; a over 7-folds for each event; b over 4 events for each fold
Fig. 5
Fig. 5
A snapshot of timeline showing predictions for video 1. Groundtruth (top) and correct predictions from VGG_fine (middle) and FetNet_DL (bottom) are shown in blue. The erroneous predictions are shown in red

References

    1. Bahdanau D, Cho K, Bengio Y (2015) A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the international conference on learning representations
    1. Bano Sophia, Vasconcelos Francisco, Tella Amo Marcel, Dwyer George, Gruijthuijsen Caspar, Deprest Jan, Ourselin Sebastien, Vander Poorten Emmanuel, Vercauteren Tom, Stoyanov Danail. Lecture Notes in Computer Science. Cham: Springer International Publishing; 2019. Deep Sequential Mosaicking of Fetoscopic Videos; pp. 311–319.
    1. Baschat A, Chmait RH, Deprest J, Gratacós E, Hecher K, Kontopoulos E, Quintero R, Skupski DW, Valsky DV, Ville Y. Twin-to-twin transfusion syndrome (TTTS) J Perinat Med. 2011;39(2):107–112. - PubMed
    1. Baud D, Windrim R, Keunen J, Kelly EN, Shah P, Van Mieghem T, Seaward PGR, Ryan G. Fetoscopic laser therapy for twin-twin transfusion syndrome before 17 and after 26 weeks’ gestation. Am J Obstet Gynecol. 2013;208(3):e1-197. doi: 10.1016/j.ajog.2012.11.027. - DOI - PubMed
    1. Cadene R, Robert T, Thome N, Cord M (2016) M2cai workflow challenge: convolutional neural networks with time smoothing and hidden Markov model for video frames classification. arXiv preprint arXiv:1610.05541

LinkOut - more resources