VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

Ha Q Nguyen^#^{1

2}, Khanh Lam^#³, Linh T Le^#⁴, Hieu H Pham^{5

6

7

8}, Dat Q Tran¹, Dung B Nguyen¹, Dung D Le^#³, Chi M Pham^#³, Hang T T Tong^#³, Diep H Dinh^#³, Cuong D Do^#³, Luu T Doan^#⁴, Cuong N Nguyen^#⁴, Binh T Nguyen^#⁴, Que V Nguyen^#⁴, Au D Hoang^#⁴, Hien N Phan^#⁴, Anh T Nguyen^#⁴, Phuong H Ho^#⁹, Dat T Ngo², Nghia T Nguyen², Nhan T Nguyen², Minh Dao¹, Van Vu^{1

10}

Affiliations

¹ Vingroup Big Data Institute, Hanoi, Vietnam.
² Smart Health Center, VinBigData JSC, Hanoi, Vietnam.
³ Hospital 108, Department of Radiology, Hanoi, Vietnam.
⁴ Hanoi Medical University Hospital, Department of Radiology, Hanoi, Vietnam.
⁵ Vingroup Big Data Institute, Hanoi, Vietnam. hieu.ph@vinuni.edu.vn.
⁶ Smart Health Center, VinBigData JSC, Hanoi, Vietnam. hieu.ph@vinuni.edu.vn.
⁷ College of Engineering and Computer Science, VinUniversity, Hanoi, Vietnam. hieu.ph@vinuni.edu.vn.
⁸ VinUni-Illinois Smart Health Center, VinUniversity, Hanoi, Vietnam. hieu.ph@vinuni.edu.vn.
⁹ Tam Anh General Hospital, Department of Radiology, Ho Chi Minh City, Vietnam.
¹⁰ Yale University, Department of Mathematics, New Heaven, CT, 06511, USA.

^# Contributed equally.

PMID: 35858929
PMCID: PMC9300612
DOI: 10.1038/s41597-022-01498-w

VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

Ha Q Nguyen et al. Sci Data. 2022.

. 2022 Jul 20;9(1):429.

doi: 10.1038/s41597-022-01498-w.

Authors

Affiliations

¹ Vingroup Big Data Institute, Hanoi, Vietnam.
² Smart Health Center, VinBigData JSC, Hanoi, Vietnam.
³ Hospital 108, Department of Radiology, Hanoi, Vietnam.
⁴ Hanoi Medical University Hospital, Department of Radiology, Hanoi, Vietnam.
⁵ Vingroup Big Data Institute, Hanoi, Vietnam. hieu.ph@vinuni.edu.vn.
⁶ Smart Health Center, VinBigData JSC, Hanoi, Vietnam. hieu.ph@vinuni.edu.vn.
⁷ College of Engineering and Computer Science, VinUniversity, Hanoi, Vietnam. hieu.ph@vinuni.edu.vn.
⁸ VinUni-Illinois Smart Health Center, VinUniversity, Hanoi, Vietnam. hieu.ph@vinuni.edu.vn.
⁹ Tam Anh General Hospital, Department of Radiology, Ho Chi Minh City, Vietnam.
¹⁰ Yale University, Department of Mathematics, New Heaven, CT, 06511, USA.

^# Contributed equally.

PMID: 35858929
PMCID: PMC9300612
DOI: 10.1038/s41597-022-01498-w

Abstract

Most of the existing chest X-ray datasets include labels from a list of findings without specifying their locations on the radiographs. This limits the development of machine learning algorithms for the detection and localization of chest abnormalities. In this work, we describe a dataset of more than 100,000 chest X-ray scans that were retrospectively collected from two major hospitals in Vietnam. Out of this raw data, we release 18,000 images that were manually annotated by a total of 17 experienced radiologists with 22 local labels of rectangles surrounding abnormalities and 6 global labels of suspected diseases. The released dataset is divided into a training set of 15,000 and a test set of 3,000. Each scan in the training set was independently labeled by 3 radiologists, while each scan in the test set was labeled by the consensus of 5 radiologists. We designed and built a labeling platform for DICOM images to facilitate these annotation procedures. All images are made publicly available in DICOM format along with the labels of both the training set and the test set.

PubMed Disclaimer

Conflict of interest statement

This work was funded by the Vingroup JSC. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Figures

**Fig. 1**
The flow of creating VinDr-CXR dataset: (1) raw images in DICOM format were collected retrospectively from the hospital’s PACS and got de-identified to protect patient’s privacy; (2) invalid files, such as images of other modalities, other body parts, low quality, or incorrect orientation, were automatically filtered out by a CNN-based classifier; (3) A web-based labeling tool, VinDr Lab, was developed to store, manage, and remotely annotate DICOM data: each image in the training set of 15,000 images was independently labeled by a group of 3 radiologists and each image in the test set of 3,000 images was labeled by the consensus of 5 radiologists.

**Fig. 2**
Examples of valid (**left**) and invalid (**right**) CXR scans. A CNN-based classifier was trained and used to automatically filter outliers; only valid PA-view CXRs of adults were retained for labeling.

**Fig. 3**
Examples of CXRs with radiologist’s annotations. Abnormal findings (local labels) marked by radiologists are plotted on the original images for visualization purpose. The global labels are in bold and listed at the bottom of each example. Better viewed on a computer and zoomed in for details.

**Fig. 4**
Distribution of findings and pathologies on the training set of VinDr-CXR.

See this image and copyright information in PMC

References

1. Rajpurkar P, et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Medicine. 2018;15:e1002686. doi: 10.1371/journal.pmed.1002686. - DOI - PMC - PubMed
1. Irvin J, et al. CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. Proceedings of the AAAI Conference on Artificial Intelligence. 2019;33:590–597. doi: 10.1609/aaai.v33i01.3301590. - DOI
1. Majkowska A, et al. Chest radiograph interpretation with deep learning models: Assessment with radiologist-adjudicated reference standards and population-adjusted evaluation. Radiology. 2020;294:421–431. doi: 10.1148/radiol.2019191293. - DOI - PubMed
1. Tang Y-X, et al. Automated abnormality classification of chest radiographs using deep convolutional neural networks. npj Digital Medicine. 2020;3:1–8. doi: 10.1038/s41746-020-0273-z. - DOI - PMC - PubMed
1. Pham HH, Le TT, Tran DQ, Ngo DT, Nguyen HQ. Interpreting chest x-rays via cnns that exploit hierarchical disease dependencies and uncertainty labels. Neurocomputing. 2021;437:186–194. doi: 10.1016/j.neucom.2020.03.127. - DOI

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

Affiliations

VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources