Do Transformers and CNNs Learn Different Concepts of Brain Age?

Nys Tjade Siegel¹, Dagmar Kainmueller^{2

3

4}, Fatma Deniz^{5

6}, Kerstin Ritter^{1

5

7}, Marc-Andre Schulz^{1

5

7}

Affiliations

¹ Department of Psychiatry and Neurosciences, Charité - Universitätsmedizin Berlin (Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Berlin, Germany.
² Max-Delbrueck-Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany.
³ Helmholtz Imaging, Berlin, Germany.
⁴ Digital Engineering Faculty of the University of Potsdam, Potsdam, Germany.
⁵ Bernstein Center for Computational Neuroscience, Berlin, Germany.
⁶ Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany.
⁷ Hertie Institute for AI in Brain Health, University of Tübingen, Tübingen, Germany.

PMID: 40489428
PMCID: PMC12147945
DOI: 10.1002/hbm.70243

Do Transformers and CNNs Learn Different Concepts of Brain Age?

Nys Tjade Siegel et al. Hum Brain Mapp. 2025.

. 2025 Jun 1;46(8):e70243.

doi: 10.1002/hbm.70243.

Authors

Nys Tjade Siegel¹, Dagmar Kainmueller^{2

3

4}, Fatma Deniz^{5

6}, Kerstin Ritter^{1

5

7}, Marc-Andre Schulz^{1

5

7}

Affiliations

¹ Department of Psychiatry and Neurosciences, Charité - Universitätsmedizin Berlin (Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health), Berlin, Germany.
² Max-Delbrueck-Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany.
³ Helmholtz Imaging, Berlin, Germany.
⁴ Digital Engineering Faculty of the University of Potsdam, Potsdam, Germany.
⁵ Bernstein Center for Computational Neuroscience, Berlin, Germany.
⁶ Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany.
⁷ Hertie Institute for AI in Brain Health, University of Tübingen, Tübingen, Germany.

PMID: 40489428
PMCID: PMC12147945
DOI: 10.1002/hbm.70243

Abstract

"Predicted brain age" refers to a biomarker of structural brain health derived from machine learning analysis of T1-weighted brain magnetic resonance (MR) images. A range of machine learning methods have been used to predict brain age, with convolutional neural networks (CNNs) currently yielding state-of-the-art accuracies. Recent advances in deep learning have introduced transformers, which are conceptually distinct from CNNs, and appear to set new benchmarks in various domains of computer vision. Given that transformers are not yet established in brain age prediction, we present three key contributions to this field: First, we examine whether transformers outperform CNNs in predicting brain age. Second, we identify that different deep learning model architectures potentially capture different (sub-)sets of brain aging effects, reflecting divergent "concepts of brain age". Third, we analyze whether such differences manifest in practice. To investigate these questions, we adapted a Simple Vision Transformer (sViT) and a shifted window transformer (SwinT) to predict brain age, and compared both models with a ResNet50 on 46,381 T1-weighted structural MR images from the UK Biobank. We found that SwinT and ResNet performed on par, though SwinT is likely to surpass ResNet in prediction accuracy with additional training data. Furthermore, to assess whether sViT, SwinT, and ResNet capture different concepts of brain age, we systematically analyzed variations in their predictions and clinical utility for indicating deviations in neurological and psychiatric disorders. Reassuringly, we observed no substantial differences in the structure of brain age predictions across the model architectures. Our findings suggest that the choice of deep learning model architecture does not appear to have a confounding effect on brain age studies.

PubMed Disclaimer

Figures

**FIGURE 1**
Overview of workflow and results: (a) We used 46.381 structural magnetic resonance imaging (sMRI) brain scans from the UK Biobank (UKBB) to train and evaluate a convolutional neural network (CNN; 3D ResNet50) and two transformers (3D simple vision transformer; sViT; 3D shifted window transformer; SwinT) for brain age prediction. Mean absolute errors (MAEs) for held‐out healthy subjects were nearly identical for ResNet (2.66 years) and SwinT (2.67 years). We define the term “concept of brain age” as the distinct brain aging effects identified by a brain age model and the way these aging effects are synthesized into scalar predictions. (b) Effect sizes between prediction errors (brain age gaps; BAGs) of patients and matched controls were similar for CNN and transformers across neurological‐ and psychiatric diseases, yielding no indication that different model architectures rely on meaningfully different concepts of brain age for their predictions.

**FIGURE 2**
SwinT will likely to outperform ResNet with additional training samples We trained multiple instances of each model architecture with gradually decreased training samples and found that accuracies of shifted window transformer (SwinT) and simple vision transformer (sViT) decline stronger compared to the ResNet. Extrapolating each model architecture's accuracy using power laws (Schulz, Bzdok, et al. 2024) indicates SwinT would surpass ResNet's accuracy given additional training samples. Uncertainty estimates refer to the SD across model instances.

**FIGURE 3**
Different brain age model architectures encode similar disease patterns. The figure shows effect sizes (Cohen's $d$ ) measured between BAGs of patients and matched controls. Effect sizes between model architectures were within one $σ$ from each other for any disease, with no indication of differences. Error bars indicate the standard error of the mean estimate derived by bootstrapping patient‐control pairs.

**FIGURE 4**
Association of BAG and cognitive, lifestyle and biomedical phenotypes seems not to depend on the model architecture. We fitted linear models from BAG and confounds to phenotype and report the t‐statistic for whether the BAG is a significant predictor. Error bars indicate the t‐statistic's standard error of the mean estimate, derived by bootstrapping. BAGs of different model architectures were similarly predictive for the analyzed phenotypes.

**FIGURE 5**
Similar brain features appear to be relevant for age predictions across different model architectures. Using Input $\times$ Gradient (IxG) Shrikumar et al. (2016), we generated feature‐relevance heatmaps for each held‐out healthy subject across ResNet, SwinT, and sViT. These heatmaps, averaged across random model architecture initializations and visualized at group‐level using a color scale (dark red = low relevance, white = high), revealed highly consistent brain regions across architectures, suggesting they capture comparable features of brain aging. Slight variations in the heatmaps likely stem from interactions between the model architectures and IxG, rather than reflecting meaningful differences in the underlying relevant features. The consistency in highlighted brain regions across ResNet, SwinT and sViT reinforces our conclusion that different model architectures are unlikely to learn different concepts of brain age. Notably, brain regions such as the cerebellum, basal ganglia, and brain stem, which were consistently identified as important, are well‐documented for their roles in aging processes (Walhovd et al. 2011), further validating their relevance as predictors of age.

See this image and copyright information in PMC

References

1. Adebayo, J. , Gilmer J., Muelly M., Goodfellow I., Hardt M., and Kim B.. 2018. “Sanity Checks for Saliency Maps.” Advances in Neural Information Processing Systems 31.
1. Amoroso, N. , La Rocca M., Bellantuono L., et al. 2019. “Deep Learning and Multiplex Networks for Accurate Modeling of Brain Age.” Frontiers in Aging Neuroscience 11: 115. - PMC - PubMed
1. Anderton, B. H. 1997. “Changes in the Ageing Brain in Health and Disease. Philosophical Transactions of the Royal Society of London.” Series B: Biological Sciences 352: 1781–1792. - PMC - PubMed
1. Bacas, E. , Kahhalé I., Raamana P. R., Pablo J. B., Anand A. S., and Hanson J. L.. 2023. “Probing Multiple Algorithms to Calculate Brain Age: Examining Reliability, Relations With Demographics, and Predictive Power.” Human Brain Mapping 44: 3481–3492. - PMC - PubMed
1. Baecker, L. , Dafflon J., Da Costa P. F., et al. 2021. “Brain Age Prediction: A Comparison Between Machine Learning Models Using Region‐and Voxel‐Based Morphometric Data.” Human Brain Mapping 42: 2332–2346. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Do Transformers and CNNs Learn Different Concepts of Brain Age?

Affiliations

Do Transformers and CNNs Learn Different Concepts of Brain Age?

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical