Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 16;8(2):26.
doi: 10.3390/bioengineering8020026.

An Open-Source COVID-19 CT Dataset with Automatic Lung Tissue Classification for Radiomics

Affiliations

An Open-Source COVID-19 CT Dataset with Automatic Lung Tissue Classification for Radiomics

Paolo Zaffino et al. Bioengineering (Basel). .

Abstract

The coronavirus disease 19 (COVID-19) pandemic is having a dramatic impact on society and healthcare systems. In this complex scenario, lung computerized tomography (CT) may play an important prognostic role. However, datasets released so far present limitations that hamper the development of tools for quantitative analysis. In this paper, we present an open-source lung CT dataset comprising information on 50 COVID-19-positive patients. The CT volumes are provided along with (i) an automatic threshold-based annotation obtained with a Gaussian mixture model (GMM) and (ii) a scoring provided by an expert radiologist. This score was found to significantly correlate with the presence of ground glass opacities and the consolidation found with GMM. The dataset is freely available in an ITK-based file format under the CC BY-NC 4.0 license. The code for GMM fitting is publicly available, as well. We believe that our dataset will provide a unique opportunity for researchers working in the field of medical image analysis, and hope that its release will lay the foundations for the successfully implementation of algorithms to support clinicians in facing the COVID-19 pandemic.

Keywords: COVID-19; free CT dataset; medical imaging; radiomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Exemplary 3D rendering of coronavirus disease 19 (COVID-19)-affected lungs. The clinical score S for the depicted patient is equal to 3. In the inferior left lobe, it is possible to see some opacities due to COVID-19.
Figure 2
Figure 2
Proposed labeling pipeline split into lung segmentation (panel (a)) and tissue labeling (panel (b)). After lung region segmentation (performed by using the worklflow in (a)), lung voxels of different computed tomographies (CTs) were represented as a single one-dimensional array used to fit a five-component Gaussian mixture model (GMM). Once the algorithm converged, each CT image was labeled using the estimated parameters (θ).
Figure 3
Figure 3
Exemplary axial view of an anatomical image (panel (a)) and labeled volume (panel (b)) extracted from patient 17 (S = 1). The green label represents air, the yellow label marks healthy lungs, the light blue label indicates ground glass opacity (GGO), brown voxels are consolidations, and orange clusters are other denser tissue.
Figure 4
Figure 4
Estimated Gaussians for each fold of the robustness test. Multiple lines of the same color show the results obtained from different groups of the same fold size.
Figure 5
Figure 5
Box plot of S vs. lung involvement (LI). For each S, the median ± quartiles and min–max of LI are reported.

References

    1. Dong D., Tang Z., Wang S., Hui H., Gong L., Lu Y., Xue Z., Liao H., Chen F., Yang F., et al. The role of imaging in the detection and management of COVID-19: A review. IEEE Rev. Biomed. Eng. 2020;14:16–29. doi: 10.1109/RBME.2020.2990959. - DOI - PubMed
    1. Hope M.D., Raptis C.A., Shah A., Hammer M.M., Henry T.S. A role for CT in COVID-19? What data really tell us so far. Lancet. 2020;395:1189–1190. doi: 10.1016/S0140-6736(20)30728-5. - DOI - PMC - PubMed
    1. Hu Y., Jacob J., Parker G.J., Hawkes D.J., Hurst J.R., Stoyanov D. The challenges of deploying artificial intelligence models in a rapidly evolving pandemic. Nat. Mach. Intell. 2020;2:298–300. doi: 10.1038/s42256-020-0185-2. - DOI
    1. [(accessed on 15 February 2021)]; Available online: https://wiki.cancerimagingarchive.net/display/public/covid-19.
    1. Morozov S., Andreychenko A., Pavlov N., Vladzymyrskyy A., Ledikhova N., Gombolevskiy V., Blokhin I.A., Gelezhe P., Gonchar A., Chernina V.Y. MosMedData: Chest CT Scans with COVID-19 Related Findings Dataset. arXiv. 20202005.06465

LinkOut - more resources