Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec:58:101535.
doi: 10.1016/j.media.2019.101535. Epub 2019 Jul 18.

Disentangled representation learning in cardiac image analysis

Affiliations

Disentangled representation learning in cardiac image analysis

Agisilaos Chartsias et al. Med Image Anal. 2019 Dec.

Abstract

Typically, a medical image offers spatial information on the anatomy (and pathology) modulated by imaging specific characteristics. Many imaging modalities including Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) can be interpreted in this way. We can venture further and consider that a medical image naturally factors into some spatial factors depicting anatomy and factors that denote the imaging characteristics. Here, we explicitly learn this decomposed (disentangled) representation of imaging data, focusing in particular on cardiac images. We propose Spatial Decomposition Network (SDNet), which factorises 2D medical images into spatial anatomical factors and non-spatial modality factors. We demonstrate that this high-level representation is ideally suited for several medical image analysis tasks, such as semi-supervised segmentation, multi-task segmentation and regression, and image-to-image synthesis. Specifically, we show that our model can match the performance of fully supervised segmentation models, using only a fraction of the labelled images. Critically, we show that our factorised representation also benefits from supervision obtained either when we use auxiliary tasks to train the model in a multi-task setting (e.g. regressing to known cardiac indices), or when aggregating multimodal data from different sources (e.g. pooling together MRI and CT data). To explore the properties of the learned factorisation, we perform latent-space arithmetic and show that we can synthesise CT from MR and vice versa, by swapping the modality factors. We also demonstrate that the factor holding image specific information can be used to predict the input modality with high accuracy. Code will be made available at https://github.com/agis85/anatomy_modality_decomposition.

Keywords: Cardiac magnetic resonance imaging; Disentangled representation learning; Multitask learning; Semi-supervised segmentation.

PubMed Disclaimer

Figures

Fig. 1:
Fig. 1:
A schematic overview of the proposed model. An input image is first encoded to a multi-channel spatial representation, the anatomical factor s, using an anatomy encoder fanatomy. Then s can be used as an input to a segmentation network h to produce a multi-class segmentation mask, (or some other task specific network). The factor s along with the input image are used by a modality encoder fmodality to produce a latent vector z representing the imaging modality. The two representations s and z are combined to reconstruct the input image through the decoder network g.
Fig. 2:
Fig. 2:
The architectures of the four networks that make up SDNet. The anatomy encoder is a standard U-Net (Ronneberger et al., 2015) that produces a spatial anatomical representation s. The modality encoder is a convolutional network (except for a fully connected final layer) that produces the modality representation z. The segmentor is a small fully convolutional network that produces the final segmentation prediction of a multi-class mask (with L classes) given s. Finally the decoder produces a reconstruction of the input image from s with its output modulated by z through FiLM normalisation (Perez et al., 2018). The bottom of the figure details the components used throughout the four networks. The anatomical factor’s channels parameter C, the modality factor’s size nz, and the number of segmentation classes L depend on the specific task and are detailed in the main text.
Fig. 3:
Fig. 3:
(a) Example of a spatial representation, expressed as a multi-channel binary map. Some channels represent defined anatomical parts such as the myocardium or the left ventricle, and others the remaining anatomy required to describe the input image on the left. Observe how sparse most of the informative channels are. (b) Spatial representation with no thresholding applied. Each channel of the spatial map, also captures the intensity signal in different gray level variations and is not sparse, in contrast to Figure 3a. This may hinder an anatomical separation. Note that no specific channel ordering is imposed and thus the anatomical parts can appear in different order in the anatomical representations across experiments.
Fig. 4:
Fig. 4:
Segmentation example for different numbers of labelled images from the ACDC dataset. Blue, green and red show the models prediction for MYO, LV and RV respectively.
Fig. 5:
Fig. 5:
Example of anatomical representations from one MR and two CT images respectively. Green boxes mark common spatial information captured in the same channels, whereas red boxes mark information present in one but not the other modalities.
Fig. 6:
Fig. 6:
Modality transformation between MR and CT when a fixed anatomy is combined with a modality vector derived from each imaging modality. Specifically let xmr, xct be MR and CT images respectively. The left panel of the figure shows the original MR image xmr, and a ‘reconstruction’ of xmr using the modality component derived from xct, i.e. g(fanatomy(xmr), fmodality(xct, fanatomy(xct))). The right panel of the figure shows the original CT image xct, and a ‘reconstruction’ of xct using the modality component derived from xmr, i.e. g( fanatomy(xct), fmodality(xmr, fanatomy(xmr))).
Fig. 7:
Fig. 7:
Reconstructions of an input image, when re-arranging the channels of the spatial representation. The images from left to right are: the input, the original reconstruction, the reconstruction when moving the MYO to the LV channel, the reconstruction when exchanging the content of the MYO and the LV channels, and finally a reconstruction obtained after a random permutation of the channels.
Fig. 8:
Fig. 8:
Reconstructions when interpolating between z vectors. Each row corresponds to images obtained by changing the values of a single z-dimension. The final two columns (correlation and Δimage) indicate areas of the image mostly affected by this change in z.

References

    1. Almahairi A, Rajeswar S, Sordoni A, Bachman P, Courville AC, 2018. Augmented CycleGAN: Learning many-to-many mappings from unpaired data, in: International Conference on Machine Learning.
    1. Azadi S, Fisher M, Kim V, Wang Z, Shechtman E, Darrell T, 2018. Multi-content GAN for few-shot font style transfer, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p. 13.
    1. Bai W, Oktay O, Sinclair M, Suzuki H, Rajchl M, Tarroni G, Glocker B, King A, Matthews PM, Rueckert D, 2017. Semi-supervised learning for network-based cardiac MR image segmentation, in: Medical Image Computing and Computer-Assisted Intervention, Springer International Publishing, Cham: pp. 253–260.
    1. Bai W, Sinclair M, Tarroni G, Oktay O, Rajchl M, Vaillant G, Lee AM, Aung N, Lukaschuk E, Sanghvi MM, Zemrak F, Fung K, Paiva JM, Carapella V, Kim YJ, Suzuki H, Kainz B, Matthews PM, Petersen SE, Piechnik SK, Neubauer S, Glocker B, Rueckert D, 2018a. Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance 20, 65. doi:10.1186/s12968-018-0471-x. - DOI - PMC - PubMed
    1. Bai W, Suzuki H, Qin C, Tarroni G, Oktay O, Matthews PM, Rueckert D, 2018b. Recurrent neural networks for aortic image sequence segmentation with sparse annotations, in: Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G (Eds.), Medical Image Computing and Computer Assisted Intervention, Springer International Publishing, Cham: pp. 586–594.

Publication types

MeSH terms