. 2019 Dec:58:101535.

doi: 10.1016/j.media.2019.101535. Epub 2019 Jul 18.

Disentangled representation learning in cardiac image analysis

Agisilaos Chartsias¹, Thomas Joyce², Giorgos Papanastasiou³, Scott Semple³, Michelle Williams³, David E Newby³, Rohan Dharmakumar⁴, Sotirios A Tsaftaris⁵

Affiliations

¹ Institute for Digital Communications, School of Engineering, University of Edinburgh, West Mains Rd, Edinburgh EH9 3FB, UK. Electronic address: agis.chartsias@ed.ac.uk.
² Institute for Digital Communications, School of Engineering, University of Edinburgh, West Mains Rd, Edinburgh EH9 3FB, UK.
³ Edinburgh Imaging Facility QMRI, Edinburgh, EH16 4TJ, UK; Centre for Cardiovascular Science, Edinburgh, EH16 4TJ, UK.
⁴ Cedars Sinai Medical Center Los Angeles CA, USA.
⁵ Institute for Digital Communications, School of Engineering, University of Edinburgh, West Mains Rd, Edinburgh EH9 3FB, UK; The Alan Turing Institute, London, UK.

PMID: 31351230
PMCID: PMC6815716
DOI: 10.1016/j.media.2019.101535

Disentangled representation learning in cardiac image analysis

Agisilaos Chartsias et al. Med Image Anal. 2019 Dec.

. 2019 Dec:58:101535.

doi: 10.1016/j.media.2019.101535. Epub 2019 Jul 18.

Authors

Agisilaos Chartsias¹, Thomas Joyce², Giorgos Papanastasiou³, Scott Semple³, Michelle Williams³, David E Newby³, Rohan Dharmakumar⁴, Sotirios A Tsaftaris⁵

Affiliations

¹ Institute for Digital Communications, School of Engineering, University of Edinburgh, West Mains Rd, Edinburgh EH9 3FB, UK. Electronic address: agis.chartsias@ed.ac.uk.
² Institute for Digital Communications, School of Engineering, University of Edinburgh, West Mains Rd, Edinburgh EH9 3FB, UK.
³ Edinburgh Imaging Facility QMRI, Edinburgh, EH16 4TJ, UK; Centre for Cardiovascular Science, Edinburgh, EH16 4TJ, UK.
⁴ Cedars Sinai Medical Center Los Angeles CA, USA.
⁵ Institute for Digital Communications, School of Engineering, University of Edinburgh, West Mains Rd, Edinburgh EH9 3FB, UK; The Alan Turing Institute, London, UK.

PMID: 31351230
PMCID: PMC6815716
DOI: 10.1016/j.media.2019.101535

Abstract

Typically, a medical image offers spatial information on the anatomy (and pathology) modulated by imaging specific characteristics. Many imaging modalities including Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) can be interpreted in this way. We can venture further and consider that a medical image naturally factors into some spatial factors depicting anatomy and factors that denote the imaging characteristics. Here, we explicitly learn this decomposed (disentangled) representation of imaging data, focusing in particular on cardiac images. We propose Spatial Decomposition Network (SDNet), which factorises 2D medical images into spatial anatomical factors and non-spatial modality factors. We demonstrate that this high-level representation is ideally suited for several medical image analysis tasks, such as semi-supervised segmentation, multi-task segmentation and regression, and image-to-image synthesis. Specifically, we show that our model can match the performance of fully supervised segmentation models, using only a fraction of the labelled images. Critically, we show that our factorised representation also benefits from supervision obtained either when we use auxiliary tasks to train the model in a multi-task setting (e.g. regressing to known cardiac indices), or when aggregating multimodal data from different sources (e.g. pooling together MRI and CT data). To explore the properties of the learned factorisation, we perform latent-space arithmetic and show that we can synthesise CT from MR and vice versa, by swapping the modality factors. We also demonstrate that the factor holding image specific information can be used to predict the input modality with high accuracy. Code will be made available at https://github.com/agis85/anatomy_modality_decomposition.

Keywords: Cardiac magnetic resonance imaging; Disentangled representation learning; Multitask learning; Semi-supervised segmentation.

PubMed Disclaimer

Figures

**Fig. 1:**
A schematic overview of the proposed model. An input image is first encoded to a multi-channel spatial representation, the anatomical factor s, using an anatomy encoder f_anatomy. Then s can be used as an input to a segmentation network h to produce a multi-class segmentation mask, (or some other task specific network). The factor s along with the input image are used by a modality encoder f_modality to produce a latent vector z representing the imaging modality. The two representations s and z are combined to reconstruct the input image through the decoder network g.

**Fig. 2:**
The architectures of the four networks that make up SDNet. The anatomy encoder is a standard U-Net (Ronneberger et al., 2015) that produces a spatial anatomical representation s. The modality encoder is a convolutional network (except for a fully connected final layer) that produces the modality representation z. The segmentor is a small fully convolutional network that produces the final segmentation prediction of a multi-class mask (with L classes) given s. Finally the decoder produces a reconstruction of the input image from s with its output modulated by z through FiLM normalisation (Perez et al., 2018). The bottom of the figure details the components used throughout the four networks. The anatomical factor’s channels parameter C, the modality factor’s size n_z, and the number of segmentation classes L depend on the specific task and are detailed in the main text.

**Fig. 3:**
(a) Example of a spatial representation, expressed as a multi-channel binary map. Some channels represent defined anatomical parts such as the myocardium or the left ventricle, and others the remaining anatomy required to describe the input image on the left. Observe how sparse most of the informative channels are. (b) Spatial representation with no thresholding applied. Each channel of the spatial map, also captures the intensity signal in different gray level variations and is not sparse, in contrast to Figure 3a. This may hinder an anatomical separation. Note that no specific channel ordering is imposed and thus the anatomical parts can appear in different order in the anatomical representations across experiments.

**Fig. 4:**
Segmentation example for different numbers of labelled images from the ACDC dataset. Blue, green and red show the models prediction for MYO, LV and RV respectively.

**Fig. 5:**
Example of anatomical representations from one MR and two CT images respectively. Green boxes mark common spatial information captured in the same channels, whereas red boxes mark information present in one but not the other modalities.

**Fig. 6:**
Modality transformation between MR and CT when a fixed anatomy is combined with a modality vector derived from each imaging modality. Specifically let x_mr, x_ct be MR and CT images respectively. The left panel of the figure shows the original MR image x_mr, and a ‘reconstruction’ of x_mr using the modality component derived from x_ct, i.e. g(f_anatomy(x_mr), f_modality(x_ct, f_anatomy(x_ct))). The right panel of the figure shows the original CT image x_ct, and a ‘reconstruction’ of x_ct using the modality component derived from x_mr, i.e. g( f_anatomy(x_ct), f_modality(x_mr, f_anatomy(x_mr))).

**Fig. 7:**
Reconstructions of an input image, when re-arranging the channels of the spatial representation. The images from left to right are: the input, the original reconstruction, the reconstruction when moving the MYO to the LV channel, the reconstruction when exchanging the content of the MYO and the LV channels, and finally a reconstruction obtained after a random permutation of the channels.

**Fig. 8:**
Reconstructions when interpolating between z vectors. Each row corresponds to images obtained by changing the values of a single z-dimension. The final two columns (correlation and Δ_image) indicate areas of the image mostly affected by this change in z.

See this image and copyright information in PMC

References

1. Almahairi A, Rajeswar S, Sordoni A, Bachman P, Courville AC, 2018. Augmented CycleGAN: Learning many-to-many mappings from unpaired data, in: International Conference on Machine Learning.
1. Azadi S, Fisher M, Kim V, Wang Z, Shechtman E, Darrell T, 2018. Multi-content GAN for few-shot font style transfer, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, p. 13.
1. Bai W, Oktay O, Sinclair M, Suzuki H, Rajchl M, Tarroni G, Glocker B, King A, Matthews PM, Rueckert D, 2017. Semi-supervised learning for network-based cardiac MR image segmentation, in: Medical Image Computing and Computer-Assisted Intervention, Springer International Publishing, Cham: pp. 253–260.
1. Bai W, Sinclair M, Tarroni G, Oktay O, Rajchl M, Vaillant G, Lee AM, Aung N, Lukaschuk E, Sanghvi MM, Zemrak F, Fung K, Paiva JM, Carapella V, Kim YJ, Suzuki H, Kainz B, Matthews PM, Petersen SE, Piechnik SK, Neubauer S, Glocker B, Rueckert D, 2018a. Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. Journal of Cardiovascular Magnetic Resonance 20, 65. doi: 10.1186/s12968-018-0471-x. - DOI - PMC - PubMed
1. Bai W, Suzuki H, Qin C, Tarroni G, Oktay O, Matthews PM, Rueckert D, 2018b. Recurrent neural networks for aortic image sequence segmentation with sparse annotations, in: Frangi AF, Schnabel JA, Davatzikos C, Alberola-López C, Fichtinger G (Eds.), Medical Image Computing and Computer Assisted Intervention, Springer International Publishing, Cham: pp. 586–594.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Consumer Health Information
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Disentangled representation learning in cardiac image analysis

Affiliations

Disentangled representation learning in cardiac image analysis

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical