Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb;29(2):840-847.
doi: 10.1109/JBHI.2024.3520156. Epub 2025 Feb 10.

Scaling Synthetic Brain Data Generation

Scaling Synthetic Brain Data Generation

Mike Doan et al. IEEE J Biomed Health Inform. 2025 Feb.

Abstract

The limited availability of diverse, high-quality datasets is a significant challenge in applying deep learning to neuroimaging research. Although synthetic data generation can potentially address this issue, on-the-fly generation is computationally demanding, while training on pre-generated data is inflexible and may incur high storage costs. We introduce Wirehead, a scalable in-memory data pipeline that significantly improves the performance of on-the-fly synthetic data generation for deep learning in neuroimaging. Wirehead's architecture decouples data generation from training by running multiple generators in independent parallel processes, facilitating near-linear performance gains proportional to the number of generators used. It efficiently handles terabytes of data using MongoDB, greatly minimizing prohibitive storage costs. The robust, modular design enables flexible pipeline configurations and fault-tolerant operation. We evaluated Wirehead with SynthSeg, a synthetic brain segmentation data generation tool that requires 7 days to train a model. When deployed in parallel, Wirehead achieved a near-linear 15.7x increase in throughput with 16 generators. With 20 generators, we can train a model in 9 hours instead of 7 days. This demonstrates Wirehead's ability to greatly accelerate experimentation cycles. While Wirehead represents a substantial step forward, it also reveals opportunities for future research in optimizing generation-training balance and resource allocation. Its ability to facilitate distributed deep learning has significant implications for enabling more ambitious neuroimaging research.

PubMed Disclaimer

References

    1. Billot B, Greve DN, Puonti O, Thielscher A, Van Leemput K, Fischl B, Dalca AV, and Iglesias JE, “Synthseg: Segmentation of brain mri scans of any contrast and resolution without retraining,” Medical Image Analysis, vol. 86, p. 102789, May 2023. [Online]. Available: 10.1016/j.media.2023.102789 - DOI - PMC - PubMed
    1. Billot B, Colin Y, Magdamo Cheng S Das, and J. E. Iglesias, “Robust machine learning segmentation for large-scale analysis of heterogeneous clinical brain MRI datasets,” Proceedings of the National Academy of Sciences (PNAS), vol. 120, no. 9, pp. 1–10, 2023. - PMC - PubMed
    1. Iglesias JE, Billot B, Balbastre Y, Magdamo C, Arnold SE, Das S, Edlow BL, Alexander DC, Golland P, and Fischl B, “Synthsr: A public ai tool to turn heterogeneous clinical brain scans into high-resolution t1-weighted images for 3d morphometry,” Science advances, vol. 9, no. 5, p. eadd3607, 2023. - PMC - PubMed
    1. Hoopes A, Mora JS, Dalca AV, Fischl B, and Hoffmann M, “Synthstrip: skull-stripping for any brain image,” NeuroImage, vol. 260, p. 119474, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1053811922005900 - PMC - PubMed
    1. Iglesias JE, Billot B, Balbastre Y, Tabari A, Conklin J, Gilberto Gonza Rĺez, D. C. Alexander, P. Golland, B. L. Edlow, and B. Fischl, “Joint super-resolution and synthesis of 1 mm isotropic mp-rage volumes from clinical mri exams with scans of different orientation, resolution and contrast,” NeuroImage, vol. 237, p. 118206, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1053811921004833 - PMC - PubMed

Publication types

LinkOut - more resources