Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 20;9(1):429.
doi: 10.1038/s41597-022-01498-w.

VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

Affiliations

VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

Ha Q Nguyen et al. Sci Data. .

Abstract

Most of the existing chest X-ray datasets include labels from a list of findings without specifying their locations on the radiographs. This limits the development of machine learning algorithms for the detection and localization of chest abnormalities. In this work, we describe a dataset of more than 100,000 chest X-ray scans that were retrospectively collected from two major hospitals in Vietnam. Out of this raw data, we release 18,000 images that were manually annotated by a total of 17 experienced radiologists with 22 local labels of rectangles surrounding abnormalities and 6 global labels of suspected diseases. The released dataset is divided into a training set of 15,000 and a test set of 3,000. Each scan in the training set was independently labeled by 3 radiologists, while each scan in the test set was labeled by the consensus of 5 radiologists. We designed and built a labeling platform for DICOM images to facilitate these annotation procedures. All images are made publicly available in DICOM format along with the labels of both the training set and the test set.

PubMed Disclaimer

Conflict of interest statement

This work was funded by the Vingroup JSC. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Figures

Fig. 1
Fig. 1
The flow of creating VinDr-CXR dataset: (1) raw images in DICOM format were collected retrospectively from the hospital’s PACS and got de-identified to protect patient’s privacy; (2) invalid files, such as images of other modalities, other body parts, low quality, or incorrect orientation, were automatically filtered out by a CNN-based classifier; (3) A web-based labeling tool, VinDr Lab, was developed to store, manage, and remotely annotate DICOM data: each image in the training set of 15,000 images was independently labeled by a group of 3 radiologists and each image in the test set of 3,000 images was labeled by the consensus of 5 radiologists.
Fig. 2
Fig. 2
Examples of valid (left) and invalid (right) CXR scans. A CNN-based classifier was trained and used to automatically filter outliers; only valid PA-view CXRs of adults were retained for labeling.
Fig. 3
Fig. 3
Examples of CXRs with radiologist’s annotations. Abnormal findings (local labels) marked by radiologists are plotted on the original images for visualization purpose. The global labels are in bold and listed at the bottom of each example. Better viewed on a computer and zoomed in for details.
Fig. 4
Fig. 4
Distribution of findings and pathologies on the training set of VinDr-CXR.

References

    1. Rajpurkar P, et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Medicine. 2018;15:e1002686. doi: 10.1371/journal.pmed.1002686. - DOI - PMC - PubMed
    1. Irvin J, et al. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. Proceedings of the AAAI Conference on Artificial Intelligence. 2019;33:590–597. doi: 10.1609/aaai.v33i01.3301590. - DOI
    1. Majkowska A, et al. Chest radiograph interpretation with deep learning models: Assessment with radiologist-adjudicated reference standards and population-adjusted evaluation. Radiology. 2020;294:421–431. doi: 10.1148/radiol.2019191293. - DOI - PubMed
    1. Tang Y-X, et al. Automated abnormality classification of chest radiographs using deep convolutional neural networks. npj Digital Medicine. 2020;3:1–8. doi: 10.1038/s41746-020-0273-z. - DOI - PMC - PubMed
    1. Pham HH, Le TT, Tran DQ, Ngo DT, Nguyen HQ. Interpreting chest x-rays via cnns that exploit hierarchical disease dependencies and uncertainty labels. Neurocomputing. 2021;437:186–194. doi: 10.1016/j.neucom.2020.03.127. - DOI