Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec:242:107839.
doi: 10.1016/j.cmpb.2023.107839. Epub 2023 Oct 2.

The NCI Imaging Data Commons as a platform for reproducible research in computational pathology

Affiliations

The NCI Imaging Data Commons as a platform for reproducible research in computational pathology

Daniela P Schacherer et al. Comput Methods Programs Biomed. 2023 Dec.

Abstract

Background and objectives: Reproducibility is a major challenge in developing machine learning (ML)-based solutions in computational pathology (CompPath). The NCI Imaging Data Commons (IDC) provides >120 cancer image collections according to the FAIR principles and is designed to be used with cloud ML services. Here, we explore its potential to facilitate reproducibility in CompPath research.

Methods: Using the IDC, we implemented two experiments in which a representative ML-based method for classifying lung tumor tissue was trained and/or evaluated on different datasets. To assess reproducibility, the experiments were run multiple times with separate but identically configured instances of common ML services.

Results: The results of different runs of the same experiment were reproducible to a large extent. However, we observed occasional, small variations in AUC values, indicating a practical limit to reproducibility.

Conclusions: We conclude that the IDC facilitates approaching the reproducibility limit of CompPath research (i) by enabling researchers to reuse exactly the same datasets and (ii) by integrating with cloud ML services so that experiments can be run in identically configured computing environments.

Keywords: Artificial intelligence; Cloud computing; Computational pathology; FAIR; Machine learning; Reproducibility.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare no conflicts of interest.

Figures

Figure 1:
Figure 1:
Overview of the workflows of both experiments and their interactions with the IDC.
Figure 2:
Figure 2:
Example tiles of the three classes considered from the TCGA and CPTAC datasets. The width of each tile is 256 μm. The black boxes marked with arrows in the whole slide images on top show the boundaries of the upper left tiles of the TCGA data set.
Figure 3:
Figure 3:
Illustration of the CompPath analysis method. Slides were subdivided into non-overlapping rectangular tiles discarding those with more background than tissue. Each tile was assigned class probabilities using a neural network performing multi-class classification. Slide-based class values were determined by aggregating the tile-based results.
Figure 4:
Figure 4:
Generic example of a BigQuery SQL statement for compiling slide metadata. The result set is limited to slide microscopy images, as indicated by the value “SM” of the DICOM attribute “Modality”, from the collections “TCGA-LUAD” and “TCGA-LUSC”.
Figure 5:
Figure 5:
One-vs-rest ROC curves for the multi-class classification as obtained in (a) the first run of Experiment 1 using Vertex AI and (b) the second run of Experiment 2 using Colaboratory (T4).

References

    1. Louis DN, Feldman M, Carter AB, Dighe AS, Pfeifer JD, Bry L, et al. Computational pathology: A path ahead. Archives of Pathology & Laboratory Medicine. 2015;140:41–50. - PMC - PubMed
    1. Niazi MKK, Parwani AV, Gurcan MN. Digital pathology and artificial intelligence. The Lancet Oncology. 2019;20:e253–61. - PMC - PubMed
    1. Echle A, Rindtorff NT, Brinker TJ, Luedde T, Pearson AT, Kather JN. Deep learning in cancer pathology: A new generation of clinical biomarkers. British Journal of Cancer. 2020;124:686–96. - PMC - PubMed
    1. Cui M, Zhang DY. Artificial intelligence and computational pathology. Laboratory Investigation. 2021;101:412–22. - PMC - PubMed
    1. Cruz-Roa A, Gilmore H, Basavanhally A, Feldman M, Ganesan S, Shih NNC, et al. Accurate and reproducible invasive breast cancer detection in whole-slide images: A deep learning approach for quantifying tumor extent. Scientific Reports. 2017;7. - PMC - PubMed