. 2022 Dec:35:13541-13556.

Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis

Mengwei Ren¹, Neel Dey¹, Martin A Styner², Kelly N Botteron³, Guido Gerig¹

Affiliations

PMID: 37614415
PMCID: PMC10445502

Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis

Mengwei Ren et al. Adv Neural Inf Process Syst. 2022 Dec.

. 2022 Dec:35:13541-13556.

Authors

Mengwei Ren¹, Neel Dey¹, Martin A Styner², Kelly N Botteron³, Guido Gerig¹

Affiliations

¹ New York University.
² UNC-Chapel Hill.
³ WUSTL School of Medicine.

PMID: 37614415
PMCID: PMC10445502

Abstract

Recent self-supervised advances in medical computer vision exploit the global and local anatomical self-similarity for pretraining prior to downstream tasks such as segmentation. However, current methods assume i.i.d. image acquisition, which is invalid in clinical study designs where follow-up longitudinal scans track subject-specific temporal changes. Further, existing self-supervised methods for medically-relevant image-to-image architectures exploit only spatial or temporal self-similarity and do so via a loss applied only at a single image-scale, with naive multi-scale spatiotemporal extensions collapsing to degenerate solutions. To these ends, this paper makes two contributions: (1) It presents a local and multi-scale spatiotemporal representation learning method for image-to-image architectures trained on longitudinal images. It exploits the spatiotemporal self-similarity of learned multi-scale intra-subject image features for pretraining and develops several feature-wise regularizations that avoid degenerate representations; (2) During finetuning, it proposes a surprisingly simple self-supervised segmentation consistency regularization to exploit intra-subject correlation. Benchmarked across various segmentation tasks, the proposed framework outperforms both well-tuned randomly-initialized baselines and current self-supervised techniques designed for both i.i.d. and longitudinal datasets. These improvements are demonstrated across both longitudinal neurodegenerative adult MRI and developing infant brain MRI and yield both higher performance and longitudinal consistency.

PubMed Disclaimer

Figures

**Figure 1:**
On pretraining an image-to-image network with per-layer spatiotemporal self-supervision, we visualize the **intra-subject multi-scale feature similarity** between a **query** channel-wise feature and all spatial positions within the **key** feature at a different age. A: Contrastive pretraining with unsupervised negatives [44] yields only positionally-dependent representations. B: Pretraining w/o negatives [11] by using corresponding intra-subject patch locations as positives leads to semanticallyimplausible representations with low-diversity (e.g., see yellow box) and artifacts (see arrows) in deeper layers. C: Our method attains both positionally and anatomically-relevant representations via proper regularization (e.g., see green box). Additional structures are visualized in Suppl. Figure 5.

**Figure 2:. Overview of proposed self-supervision.**
Given nonlinearly-registered temporal images of a subject, **(a)** we assume that corresponding spatial locations in various network layers should have similar representations. As U-Net skip connections can cause degenerate decoder embeddings (see App E), we **(b)** encourage the decoder bottleneck to be orthogonal to encoder bottleneck and regularize the concatenated decoder features to have **(c)** high spatial variance and be **(d)** uncorrelated channe-lwise. During fine-tuning, we **(e)** encourage temporal intra-subject network output consistency.

**Figure 3:. One-shot segmentation.**
**Top 3 rows**: Once pretrained on all unlabeled data, all benchmarked methods are finetuned on either a single annotated image (IBIS-wmgm) or a single annotated subject (IBIS-subcort and OASIS3). When deployed on other subjects at different ages, our method yields improved segmentation performance. **Bottom row**: When finetuned only on a single 36 month-old image, our method generalizes to unseen timepoints by leveraging temporal consistency.

**Figure 4:**
**One-shot segmentation benchmarking** quantifying performance with the Dice coefficient **(top)** and the spatiotemporal consistency of segmentation **(bottom)**, visualizing the means and standard deviations alongside median values overlaid on the top of each subfigure (higher is better). **Few-shot** and **fully-supervised** results are provided in Suppl. Tabs. 3 and 4, respectively.

See this image and copyright information in PMC

References

1. Aljabar Paul, Heckemann Rolf A, Hammers Alexander, Hajnal Joseph V, and Rueckert Daniel. Multi-atlas based segmentation of brain images: atlas selection and its effect on accuracy. Neuroimage, 46(3):726–738, 2009. - PubMed
1. Alonso Iñigo, Sabater Alberto, Ferstl David, Montesano Luis, and Murillo Ana C.. Semisupervised semantic segmentation with pixel-level contrastive learning from a class-wise memory bank. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8219–8228, October 2021.
1. Avants Brian B, Epstein Charles L, Grossman Murray, and Gee James C. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis, 12(1):26–41, 2008. - PMC - PubMed
1. Avants Brian B, Yushkevich Paul, Pluta John, Minkoff David, Korczykowski Marc, Detre John, and Gee James C. The optimal template effect in hippocampus studies of diseased populations. Neuroimage, 49(3):2457–2466, 2010. - PMC - PubMed
1. Bai Yutong, Fan Haoqi, Misra Ishan, Venkatesh Ganesh, Lu Yongyi, Zhou Yuyin, Yu Qihang, Chandra Vikas, and Yuille Alan. Can temporal information help with contrastive self-supervised learning? arXiv preprint arXiv:2011.13046, 2020.

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis

Affiliations

Local Spatiotemporal Representation Learning for Longitudinally-consistent Neuroimage Analysis

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources