Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 10:2025:471-480.
eCollection 2025.

From Scanner to Science: Reusing Clinically Acquired Medical Images for Research

Affiliations

From Scanner to Science: Reusing Clinically Acquired Medical Images for Research

Jenna M Schabdach et al. AMIA Jt Summits Transl Sci Proc. .

Abstract

Growth in the field of medical imaging research has revealed a need for larger volume and variety in available data. This need could be met using curated clinically acquired data, but the process for getting this data from the scanners to the scientists is complex and lengthy. We present a manifest-driven modular Extract, Transform, and Load (ETL) process named Locutus designed to appropriately handle difficulties present in the process of reusing clinically acquired medical imaging data. The design of Locutus was based on four foundational assumptions about medical data, research data, and communication. All parts of a workflow must communicate with each other and be adaptable to unique data delivery requests. In addition, the workflow must be robust to possible errors and uncertainties in clinically-acquired data, which may require human intervention to resolve. With these assumptions in mind,Locutus presents a five-phase workflow for downloading, deidentifying, and delivering unique requests for imaging data. The phases include initialization, data preparation, extraction of data from the research server to a pre-deidentification data warehouse, transformation into deidentified space, and loading into post-deidentification data warehouse. To date, this workflow has been used to process 32,962 imaging accessions for research use. This number is expected to grow as technical challenges are addressed and the role of humans is expected to shift from frequent intervention to regular monitoring.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the Locutus workflow. The purpose of the workflow is to deliver a set of deidentified scans requested from the radiology department picture archiving and communications systems (PACS) to the researcher. Coordination between the research team, the radiology team, and the Locutus team enables Locutus to process a set of scans that are first copied from the Clinical PACS to a Research PACS. Locutus loads the delivery-specific system settings from the manifest in Phase 1, prepares to pull the requested scans from the Research PACS in Phase 2, extracts the requested scans to an identified DICOM dataset in Phase 3, transforms the identified data into a deidentified space in Phase 4, and loads the deidentified data into a deidentified DICOM dataset in Phase 5. Locutus itself can be conceptualized as a water wheel, where the data flowing through it is a collection of individual water molecules. The buckets next to Phases 3-5 represent the interim versions of each DICOM file as it flows through Locutus.
Figure 2.
Figure 2.
Schematic of Locutus Phase 4: Transform data into deidentified space. The process of removing patient health information from DICOM files requires several levels of deidentification. In addition to metadata filtering using the DICOM tag remove and keep lists, one of several pixel-level deidentification software modules may be used to redact burned-in pixel-based PHI.
Figure 3.
Figure 3.
The number of patient scan sessions processed via Locutus at time of submission of the present paper. A) The barplot shows the discrete number of accessions processed by Locutus at each date. Accessions refer to identifiers that correspond to patient scan sessions. B) The blue line shows the cumulative number of accessions processed by Loctus. As Locutus has increased in scale, it has facilitated the deidentification and delivery of 32,962 clinically acquired scan sessions.

References

    1. Morris MA, Saboury B, Burkett B, Gao J, Siegel EL. Reinventing radiology: Big data and the future of medical imaging. J Thorac Imaging. 2018 Jan;33(1):4–16. doi: 10.1097/RTI.0000000000000311. PMID: 29252898. - PubMed
    1. Schabdach JM, Schmitt JE, Sotardi S, Vossough A, Andronikou S, Roberts TP, Huang H, Padmanabhan V, Ortiz-Rosa A, Gardner M, Covitz S, Bedford SA, Mandal AS, Chaiyachati BH, White SR, Bullmore E, Bethlehem RAI, Shinohara RT, Billot B, Iglesias JE, Ghosh S, Gur RE, Satterthwaite TD, Roalf D, Seidlitz J, Alexander-Bloch A. Lifespan Brain Chart Consortium. Brain growth charts for quantitative analysis of pediatric clinical brain MRI scans with limited imaging pathology. Radiology. 2023 Oct;309(1):e230096. doi: 10.1148/radiol.230096. PMID: 37906015; PMCID: PMC10623207. - PMC - PubMed
    1. Ertl-Wagner BB, Pai V. Broadening the scope of normal control images in pediatric neuroimaging-and possibly beyond. Radiology. 2023 Oct;309(1):e232598. doi: 10.1148/radiol.232598. PMID: 37906004. - PubMed
    1. Amin J, Anjum MA, Gul N, Sharif M, Kadry S. Clinically acquired new challenging dataset for brain SOL segmentation: AJBDS-2023. Data Brief. 2023 Dec 7;52:109915. doi: 10.1016/j.dib.2023.109915. PMID: 38229924; PMCID: PMC10790026. - PMC - PubMed
    1. Davis K, Peabody B, Leach P. Universally Unique Identifiers (UUIDS) RFC 9562. May 2024 doi: https://doi.org/10.17487/RFC9562 .

LinkOut - more resources