Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr;295(1):4-15.
doi: 10.1148/radiol.2020192224. Epub 2020 Feb 18.

Preparing Medical Imaging Data for Machine Learning

Affiliations

Preparing Medical Imaging Data for Machine Learning

Martin J Willemink et al. Radiology. 2020 Apr.

Abstract

Artificial intelligence (AI) continues to garner substantial interest in medical imaging. The potential applications are vast and include the entirety of the medical imaging life cycle from image creation to diagnosis to outcome prediction. The chief obstacles to development and clinical implementation of AI algorithms include availability of sufficiently large, curated, and representative training data that includes expert labeling (eg, annotations). Current supervised AI methods require a curation process for data to optimally train, validate, and test algorithms. Currently, most research groups and industry have limited data access based on small sample sizes from small geographic areas. In addition, the preparation of data is a costly and time-intensive process, the results of which are algorithms with limited utility and poor generalization. In this article, the authors describe fundamental steps for preparing medical imaging data in AI algorithm development, explain current limitations to data curation, and explore new approaches to address the problem of data availability.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1:
Figure 1:
Diagram shows process of medical image data handling.
Figure 2:
Figure 2:
Diagram shows value hierarchy of imaging annotation. Most useful but least abundant is ground truth (pathologic, genomic, or clinical outcome data). Prospective annotation is incredibly valuable due to availability of contemporaneous information (clinical and/or laboratory data). By comparison, retrospective annotations are least valuable.
Figure 3:
Figure 3:
Image in posterior-anterior direction shows nonspecific abnormality on chest radiograph. Application of most accurate label for nonspecific finding such as opacity in left lung (circle) is challenging in absence of other clinical and laboratory data.
Figure 4a:
Figure 4a:
Axial images show medical image segmentations performed by experts. (a) CT examination of patient with lung nodule. (b) Nodule is independently and blindly segmented by three medical experts with free open-source software package (Horos, version 3.3.5; Nimble d/b/a Purview, Annapolis, Md). (c) Magnified image of segmentations. There are differences between segmentations; however, these differences are small and not clinically relevant.
Figure 4b:
Figure 4b:
Axial images show medical image segmentations performed by experts. (a) CT examination of patient with lung nodule. (b) Nodule is independently and blindly segmented by three medical experts with free open-source software package (Horos, version 3.3.5; Nimble d/b/a Purview, Annapolis, Md). (c) Magnified image of segmentations. There are differences between segmentations; however, these differences are small and not clinically relevant.
Figure 4c:
Figure 4c:
Axial images show medical image segmentations performed by experts. (a) CT examination of patient with lung nodule. (b) Nodule is independently and blindly segmented by three medical experts with free open-source software package (Horos, version 3.3.5; Nimble d/b/a Purview, Annapolis, Md). (c) Magnified image of segmentations. There are differences between segmentations; however, these differences are small and not clinically relevant.
Figure 5a:
Figure 5a:
Diagram shows centralized versus federated learning. (a) Current artificial intelligence (AI) model development is through centralized model, in which de-identified data are transferred to centralized data storage system where AI algorithm can be developed. (b) In the future, federated learning may be used, in which data stays in each hospital. With federated learning, instead of transferring data outside each hospital, data stays in hospitals and AI model is sent to and trained in hospitals.
Figure 5b:
Figure 5b:
Diagram shows centralized versus federated learning. (a) Current artificial intelligence (AI) model development is through centralized model, in which de-identified data are transferred to centralized data storage system where AI algorithm can be developed. (b) In the future, federated learning may be used, in which data stays in each hospital. With federated learning, instead of transferring data outside each hospital, data stays in hospitals and AI model is sent to and trained in hospitals.

References

    1. Langlotz CP, Allen B, Erickson BJ, et al. A Roadmap for Foundational Research on Artificial Intelligence in Medical Imaging: From the 2018 NIH/RSNA/ACR/The Academy Workshop. Radiology 2019;291(3):781–791. - PMC - PubMed
    1. Dunnmon JA, Yi D, Langlotz CP, Ré C, Rubin DL, Lungren MP. Assessment of Convolutional Neural Networks for Automated Classification of Chest Radiographs. Radiology 2019;290(2):537–544. - PMC - PubMed
    1. Prevedello LM, Erdal BS, Ryu JL, et al. Automated Critical Test Findings Identification and Online Notification System Using Artificial Intelligence in Imaging. Radiology 2017;285(3):923–931. - PubMed
    1. Yala A, Schuster T, Miles R, Barzilay R, Lehman C. A Deep Learning Model to Triage Screening Mammograms: A Simulation Study. Radiology 2019;293(1):38–46. - PubMed
    1. Gong E, Pauly JM, Wintermark M, Zaharchuk G. Deep learning enables reduced gadolinium dose for contrast-enhanced brain MRI. J Magn Reson Imaging 2018;48(2):330–340. - PubMed

Publication types