Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct;41(10):2925-2940.
doi: 10.1109/TMI.2022.3174827. Epub 2022 Sep 30.

PTNet3D: A 3D High-Resolution Longitudinal Infant Brain MRI Synthesizer Based on Transformers

PTNet3D: A 3D High-Resolution Longitudinal Infant Brain MRI Synthesizer Based on Transformers

Xuzhe Zhang et al. IEEE Trans Med Imaging. 2022 Oct.

Abstract

An increased interest in longitudinal neurodevelopment during the first few years after birth has emerged in recent years. Noninvasive magnetic resonance imaging (MRI) can provide crucial information about the development of brain structures in the early months of life. Despite the success of MRI collections and analysis for adults, it remains a challenge for researchers to collect high-quality multimodal MRIs from developing infant brains because of their irregular sleep pattern, limited attention, inability to follow instructions to stay still during scanning. In addition, there are limited analytic approaches available. These challenges often lead to a significant reduction of usable MRI scans and pose a problem for modeling neurodevelopmental trajectories. Researchers have explored solving this problem by synthesizing realistic MRIs to replace corrupted ones. Among synthesis methods, the convolutional neural network-based (CNN-based) generative adversarial networks (GANs) have demonstrated promising performance. In this study, we introduced a novel 3D MRI synthesis framework- pyramid transformer network (PTNet3D)- which relies on attention mechanisms through transformer and performer layers. We conducted extensive experiments on high-resolution Developing Human Connectome Project (dHCP) and longitudinal Baby Connectome Project (BCP) datasets. Compared with CNN-based GANs, PTNet3D consistently shows superior synthesis accuracy and superior generalization on two independent, large-scale infant brain MRI datasets. Notably, we demonstrate that PTNet3D synthesized more realistic scans than CNN-based models when the input is from multi-age subjects. Potential applications of PTNet3D include synthesizing corrupted or missing images. By replacing corrupted scans with synthesized ones, we observed significant improvement in infant whole brain segmentation.

PubMed Disclaimer

Figures

Fig. 12.
Fig. 12.
Age distribution of the BCP dataset that is used for training, validation, and testing.
Fig. 13.
Fig. 13.
Boxplots for T1w-to-T2w synthesis (a) and T2w-to-T1w synthesis (b) on multi-age BCP dataset.
Fig. 14.
Fig. 14.
Boxplots for regional Dice score on synthesized dHCP scans by PTNet3D and pix2pixHD-Local.
Fig. 15.
Fig. 15.
Visualization of original T2w scans (left) and generated motion-corrupted T2w scans (right). From top to bottom: sagittal, coronal, and axial. The left bottom number indicates different subjects.
Fig. 1.
Fig. 1.
Self-attention mechanism used in Transformer and a basic transformer block. Head count (H) is the number of scaled dot-product attention used in the multi-head attention. N is the number of successively used transformer blocks.
Fig. 2.
Fig. 2.
Difference between transformer and performer models. Upper panel: Transformer block as explained in Eq (4). Lower panel: Performer block as explained in Eq (5, 6). The red dashed block is first computed to reduce complexity. The entire green solid block is proposed to approximate the full-rank self-attention in the upper panel.
Fig. 3.
Fig. 3.
Overview of proposed 3D Pyramid Transformer Net (PTNet3D) model. We follow the classic U-shape structure and inherit the skip connection. We parallelize the conversion at two distinct resolutions and concatenate them before feeding into the transformer bottleneck. The detailed structures of each component are illustrated in Fig. 4 below. The spatial projection is a fully-connected layer that reduces the channel to output channel number.
Fig. 4.
Fig. 4.
Proposed performer encoder (a), performer decode (b), and transformer bottleneck (c). (a): The performer encoder will first unfold the feature maps into tokens. The channel after unfolding is decided by the input channel Cin and unfold kernel size n. Unfolded tokens are then fed into a performer layer. The resultant token is lastly transposed and reshaped to a feature map which has been downsampled by a scale of s (stride). In the encoding path, the unfold kernel size n is usually set as 3, and unfold stride s is usually set as 2. (b): The performer decoder will first upsample the input feature maps by a factor of s. The upsampled feature maps are then processed as mentioned in the performer encoder, but there is no stride so the upsampled feature size remains unchanged. In the decoding path, the upsample factor s is usually set as 2. The unfold kernel size n is usually set as 1 and unfold stride s is usually set as 1. Cin and Cout are changed at different levels of the network. (c): Transformer bottleneck: The transformer bottleneck utilize the same unfold as the performer encoder. And additional position encoding and linear projection are used prior to feeding in transformer blocks. The output of M transformer blocks is then transposed and reshaped and fed into the decoding path.
Fig. 5.
Fig. 5.
Visualizations and absolute error maps among existing synthesis models and our (PTNet3D) model. From left to right columes: real scan, synthetic results from pix2pix, pix2pidHD-Global, pix2pixHD-Local, StarGAN, and the proposed PTNet3D. From top to bottom rows: sagittal, coronal, and axial orientations. We noticed that other models yield a more extensive error map than the proposed PTNet3D. Red arrows indicated regions in which our PTNet3D generated more accurate results.
Fig. 6.
Fig. 6.
Boxplots for T1w-to-T2w synthesis (a, c, and e) and T2w-to-T1w synthesis (b, d, and f) on multi-age BCP dataset.
Fig. 7.
Fig. 7.
An example from a 3-month-old subject. The middle and right columns are synthesized outputs from PTNet3D and pix2pixHD-Local. The bottom row is the zoomed view of the region highlighted by the red box.
Fig. 8.
Fig. 8.
Segmentation maps from distinct inputs. Left to right: real/true scans, synthesized scans by PTNet3D, synthesized scans by pix2pixHD-Local. From top to bottom: axial view, sagittal view, coronal view. Red circles indicate the region where synthesized scans by PTNet3D yield segmentation results that are closer to those from real scans.
Fig. 9.
Fig. 9.
Feature maps from the decoding path of different models. a), PTNet_2D, b), PTNet3D, c), pix2pixHD-Global. *The checkboard artifact in panel b. is caused by stitching and doesn’t indicate failures.
Fig. 10.
Fig. 10.
Using PTNet3D in real world application. Two concatenated inputs (good-quality T1w + corrupted T2w, good-quality T1w + synthesized T2w) are fed into a dual-channel 3D UNet. The bottom panel visualizes the segmentation maps from different inputs. From left to right: segmentation from corrupted scans, ground truth released by dHCP study, and segmentation from synthesized scans.
Figure 11.
Figure 11.
Examples of data inclusion and exclusion. Scans (a-d) were included during model development, and (e-h) were excluded. Scan (a) has the best quality while (b) and (c) are slightly worse than a. And scan (d) has minor artifacts (circled region) and was not excluded since it is acceptable. Scans (e-h) were excluded because of their unacceptable qualities.

Similar articles

Cited by

References

    1. Short SJ et al., “Associations between white matter microstructure and infants’ working memory,” (in English), Neuroimage, vol. 64, pp. 156–166, Jan 1 2013, doi: 10.1016/j.neuroimage.2012.09.021. - DOI - PMC - PubMed
    1. Spann MN, Bansal R, Rosen TS, and Peterson BS, “Morphological Features of the Neonatal Brain Support Development of Subsequent Cognitive, Language, and Motor Abilities,” (in English), Human Brain Mapping, vol. 35, no. 9, pp. 4459–4474, Sep 2014, doi: 10.1002/hbm.22487. - DOI - PMC - PubMed
    1. Fjell AM et al., “Multimodal imaging of the self-regulating developing brain,” (in English), P Natl Acad Sci USA, vol. 109, no. 48, pp. 19620–19625, Nov 27 2012, doi: 10.1073/pnas.1208243109. - DOI - PMC - PubMed
    1. O’Muircheartaigh J et al., “White Matter Development and Early Cognition in Babies and Toddlers,” (in English), Human Brain Mapping, vol. 35, no. 9, pp. 4475–4487, Sep 2014, doi: 10.1002/hbm.22488. - DOI - PMC - PubMed
    1. Prastawa M, Gilmore JH, Lin W, and Gerig G, “Automatic segmentation of MR images of the developing newborn brain,” Med Image Anal, vol. 9, no. 5, pp. 457–66, Oct 2005, doi: 10.1016/j.media.2005.05.007. - DOI - PubMed

Publication types