. 2022 Oct;41(10):2925-2940.

doi: 10.1109/TMI.2022.3174827. Epub 2022 Sep 30.

PTNet3D: A 3D High-Resolution Longitudinal Infant Brain MRI Synthesizer Based on Transformers

Xuzhe Zhang, Xinzi He, Jia Guo, Nabil Ettehadi, Natalie Aw, David Semanek, Jonathan Posner, Andrew Laine, Yun Wang

PMID: 35560070
PMCID: PMC9529847
DOI: 10.1109/TMI.2022.3174827

PTNet3D: A 3D High-Resolution Longitudinal Infant Brain MRI Synthesizer Based on Transformers

Xuzhe Zhang et al. IEEE Trans Med Imaging. 2022 Oct.

. 2022 Oct;41(10):2925-2940.

doi: 10.1109/TMI.2022.3174827. Epub 2022 Sep 30.

Authors

Xuzhe Zhang, Xinzi He, Jia Guo, Nabil Ettehadi, Natalie Aw, David Semanek, Jonathan Posner, Andrew Laine, Yun Wang

PMID: 35560070
PMCID: PMC9529847
DOI: 10.1109/TMI.2022.3174827

Abstract

An increased interest in longitudinal neurodevelopment during the first few years after birth has emerged in recent years. Noninvasive magnetic resonance imaging (MRI) can provide crucial information about the development of brain structures in the early months of life. Despite the success of MRI collections and analysis for adults, it remains a challenge for researchers to collect high-quality multimodal MRIs from developing infant brains because of their irregular sleep pattern, limited attention, inability to follow instructions to stay still during scanning. In addition, there are limited analytic approaches available. These challenges often lead to a significant reduction of usable MRI scans and pose a problem for modeling neurodevelopmental trajectories. Researchers have explored solving this problem by synthesizing realistic MRIs to replace corrupted ones. Among synthesis methods, the convolutional neural network-based (CNN-based) generative adversarial networks (GANs) have demonstrated promising performance. In this study, we introduced a novel 3D MRI synthesis framework- pyramid transformer network (PTNet3D)- which relies on attention mechanisms through transformer and performer layers. We conducted extensive experiments on high-resolution Developing Human Connectome Project (dHCP) and longitudinal Baby Connectome Project (BCP) datasets. Compared with CNN-based GANs, PTNet3D consistently shows superior synthesis accuracy and superior generalization on two independent, large-scale infant brain MRI datasets. Notably, we demonstrate that PTNet3D synthesized more realistic scans than CNN-based models when the input is from multi-age subjects. Potential applications of PTNet3D include synthesizing corrupted or missing images. By replacing corrupted scans with synthesized ones, we observed significant improvement in infant whole brain segmentation.

PubMed Disclaimer

Figures

**Fig. 12.**
Age distribution of the BCP dataset that is used for training, validation, and testing.

**Fig. 13.**
Boxplots for T1w-to-T2w synthesis (a) and T2w-to-T1w synthesis (b) on multi-age BCP dataset.

**Fig. 14.**
Boxplots for regional Dice score on synthesized dHCP scans by PTNet3D and pix2pixHD-Local.

**Fig. 15.**
Visualization of original T2w scans (left) and generated motion-corrupted T2w scans (right). From top to bottom: sagittal, coronal, and axial. The left bottom number indicates different subjects.

**Fig. 1.**
Self-attention mechanism used in Transformer and a basic transformer block. Head count (H) is the number of scaled dot-product attention used in the multi-head attention. N is the number of successively used transformer blocks.

**Fig. 2.**
Difference between transformer and performer models. Upper panel: Transformer block as explained in Eq (4). Lower panel: Performer block as explained in Eq (5, 6). The red dashed block is first computed to reduce complexity. The entire green solid block is proposed to approximate the full-rank self-attention in the upper panel.

**Fig. 3.**
Overview of proposed 3D Pyramid Transformer Net (PTNet3D) model. We follow the classic U-shape structure and inherit the skip connection. We parallelize the conversion at two distinct resolutions and concatenate them before feeding into the transformer bottleneck. The detailed structures of each component are illustrated in Fig. 4 below. The spatial projection is a fully-connected layer that reduces the channel to output channel number.

**Fig. 4.**
Proposed performer encoder (a), performer decode (b), and transformer bottleneck (c). (a): The performer encoder will first unfold the feature maps into tokens. The channel after unfolding is decided by the input channel C_in and unfold kernel size n. Unfolded tokens are then fed into a performer layer. The resultant token is lastly transposed and reshaped to a feature map which has been downsampled by a scale of s (stride). In the encoding path, the unfold kernel size n is usually set as 3, and unfold stride s is usually set as 2. (b): The performer decoder will first upsample the input feature maps by a factor of s. The upsampled feature maps are then processed as mentioned in the performer encoder, but there is no stride so the upsampled feature size remains unchanged. In the decoding path, the upsample factor s is usually set as 2. The unfold kernel size n is usually set as 1 and unfold stride s is usually set as 1. C_in and C_out are changed at different levels of the network. (c): Transformer bottleneck: The transformer bottleneck utilize the same unfold as the performer encoder. And additional position encoding and linear projection are used prior to feeding in transformer blocks. The output of M transformer blocks is then transposed and reshaped and fed into the decoding path.

**Fig. 5.**
Visualizations and absolute error maps among existing synthesis models and our (PTNet3D) model. From left to right columes: real scan, synthetic results from pix2pix, pix2pidHD-Global, pix2pixHD-Local, StarGAN, and the proposed PTNet3D. From top to bottom rows: sagittal, coronal, and axial orientations. We noticed that other models yield a more extensive error map than the proposed PTNet3D. Red arrows indicated regions in which our PTNet3D generated more accurate results.

**Fig. 6.**
Boxplots for T1w-to-T2w synthesis (a, c, and e) and T2w-to-T1w synthesis (b, d, and f) on multi-age BCP dataset.

**Fig. 7.**
An example from a 3-month-old subject. The middle and right columns are synthesized outputs from PTNet3D and pix2pixHD-Local. The bottom row is the zoomed view of the region highlighted by the red box.

**Fig. 8.**
Segmentation maps from distinct inputs. Left to right: real/true scans, synthesized scans by PTNet3D, synthesized scans by pix2pixHD-Local. From top to bottom: axial view, sagittal view, coronal view. Red circles indicate the region where synthesized scans by PTNet3D yield segmentation results that are closer to those from real scans.

**Fig. 9.**
Feature maps from the decoding path of different models. a), PTNet_2D, b), PTNet3D, c), pix2pixHD-Global. *The checkboard artifact in panel b. is caused by stitching and doesn’t indicate failures.

**Fig. 10.**
Using PTNet3D in real world application. Two concatenated inputs (good-quality T1w + corrupted T2w, good-quality T1w + synthesized T2w) are fed into a dual-channel 3D UNet. The bottom panel visualizes the segmentation maps from different inputs. From left to right: segmentation from corrupted scans, ground truth released by dHCP study, and segmentation from synthesized scans.

**Figure 11.**
Examples of data inclusion and exclusion. Scans (a-d) were included during model development, and (e-h) were excluded. Scan (a) has the best quality while (b) and (c) are slightly worse than a. And scan (d) has minor artifacts (circled region) and was not excluded since it is acceptable. Scans (e-h) were excluded because of their unacceptable qualities.

See this image and copyright information in PMC

Cited by

One Model to Synthesize Them All: Multi-Contrast Multi-Scale Transformer for Missing Data Imputation.
Liu J, Pasumarthi S, Duffy B, Gong E, Datta K, Zaharchuk G. Liu J, et al. IEEE Trans Med Imaging. 2023 Sep;42(9):2577-2591. doi: 10.1109/TMI.2023.3261707. Epub 2023 Aug 31. IEEE Trans Med Imaging. 2023. PMID: 37030684 Free PMC article.
LungViT: Ensembling Cascade of Texture Sensitive Hierarchical Vision Transformers for Cross-Volume Chest CT Image-to-Image Translation.
Chaudhary MFA, Gerard SE, Christensen GE, Cooper CB, Schroeder JD, Hoffman EA, Reinhardt JM. Chaudhary MFA, et al. IEEE Trans Med Imaging. 2024 Jul;43(7):2448-2465. doi: 10.1109/TMI.2024.3367321. Epub 2024 Jul 1. IEEE Trans Med Imaging. 2024. PMID: 38373126 Free PMC article.
Unsupervised single-image super-resolution for infant brain MRI.
Tsai CC, Chen X, Ahmad S, Yap PT. Tsai CC, et al. Neuroimage. 2025 Aug 15;317:121293. doi: 10.1016/j.neuroimage.2025.121293. Epub 2025 Jun 21. Neuroimage. 2025. PMID: 40550405 Free PMC article.
SCANeXt: Enhancing 3D medical image segmentation with dual attention network and depth-wise convolution.
Liu Y, Zhang Z, Yue J, Guo W. Liu Y, et al. Heliyon. 2024 Feb 28;10(5):e26775. doi: 10.1016/j.heliyon.2024.e26775. eCollection 2024 Mar 15. Heliyon. 2024. PMID: 38439873 Free PMC article.
Synthesizing pseudo-T2w images to recapture missing data in neonatal neuroimaging with applications in rs-fMRI.
Kaplan S, Perrone A, Alexopoulos D, Kenley JK, Barch DM, Buss C, Elison JT, Graham AM, Neil JJ, O'Connor TG, Rasmussen JM, Rosenberg MD, Rogers CE, Sotiras A, Fair DA, Smyser CD. Kaplan S, et al. Neuroimage. 2022 Jun;253:119091. doi: 10.1016/j.neuroimage.2022.119091. Epub 2022 Mar 11. Neuroimage. 2022. PMID: 35288282 Free PMC article.

See all "Cited by" articles

References

1. Short SJ et al., “Associations between white matter microstructure and infants’ working memory,” (in English), Neuroimage, vol. 64, pp. 156–166, Jan 1 2013, doi: 10.1016/j.neuroimage.2012.09.021. - DOI - PMC - PubMed
1. Spann MN, Bansal R, Rosen TS, and Peterson BS, “Morphological Features of the Neonatal Brain Support Development of Subsequent Cognitive, Language, and Motor Abilities,” (in English), Human Brain Mapping, vol. 35, no. 9, pp. 4459–4474, Sep 2014, doi: 10.1002/hbm.22487. - DOI - PMC - PubMed
1. Fjell AM et al., “Multimodal imaging of the self-regulating developing brain,” (in English), P Natl Acad Sci USA, vol. 109, no. 48, pp. 19620–19625, Nov 27 2012, doi: 10.1073/pnas.1208243109. - DOI - PMC - PubMed
1. O’Muircheartaigh J et al., “White Matter Development and Early Cognition in Babies and Toddlers,” (in English), Human Brain Mapping, vol. 35, no. 9, pp. 4475–4487, Sep 2014, doi: 10.1002/hbm.22488. - DOI - PMC - PubMed
1. Prastawa M, Gilmore JH, Lin W, and Gerig G, “Automatic segmentation of MR images of the developing newborn brain,” Med Image Anal, vol. 9, no. 5, pp. 457–66, Oct 2005, doi: 10.1016/j.media.2005.05.007. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PTNet3D: A 3D High-Resolution Longitudinal Infant Brain MRI Synthesizer Based on Transformers

PTNet3D: A 3D High-Resolution Longitudinal Infant Brain MRI Synthesizer Based on Transformers

Authors

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical