. 2024 Jun;84(7):1139-1153.

doi: 10.1111/his.15159. Epub 2024 Feb 26.

Automated curation of large-scale cancer histopathology image datasets using deep learning

Lars Hilgers^{1

2}, Narmin Ghaffari Laleh^{1

2}, Nicholas P West³, Alice Westwood³, Katherine J Hewitt^{1

2}, Philip Quirke³, Heike I Grabsch^{3

4}, Zunamys I Carrero², Emylou Matthaei², Chiara M L Loeffler², Titus J Brinker⁵, Tanwei Yuan⁶, Hermann Brenner^{6

7

8}, Alexander Brobeil^{9

10}, Michael Hoffmeister⁶, Jakob Nikolas Kather^{2

3

11}

Affiliations

¹ Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
² Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany.
³ Pathology & Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK.
⁴ Department of Pathology, GROW - Research Institute for Oncology and Reproduction, Maastricht University Medical Center+, Maastricht, The Netherlands.
⁵ Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany.
⁶ Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany.
⁷ Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany.
⁸ German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany.
⁹ Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany.
¹⁰ Tissue Bank, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
¹¹ Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.

PMID: 38409878
DOI: 10.1111/his.15159

Automated curation of large-scale cancer histopathology image datasets using deep learning

Lars Hilgers et al. Histopathology. 2024 Jun.

. 2024 Jun;84(7):1139-1153.

doi: 10.1111/his.15159. Epub 2024 Feb 26.

Authors

Affiliations

¹ Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany.
² Else Kroener Fresenius Center for Digital Health, Medical Faculty Carl Gustav Carus, Technical University Dresden, Dresden, Germany.
³ Pathology & Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK.
⁴ Department of Pathology, GROW - Research Institute for Oncology and Reproduction, Maastricht University Medical Center+, Maastricht, The Netherlands.
⁵ Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany.
⁶ Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany.
⁷ Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany.
⁸ German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany.
⁹ Institute of Pathology, University Hospital Heidelberg, Heidelberg, Germany.
¹⁰ Tissue Bank, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.
¹¹ Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany.

PMID: 38409878
DOI: 10.1111/his.15159

Abstract

Background: Artificial intelligence (AI) has numerous applications in pathology, supporting diagnosis and prognostication in cancer. However, most AI models are trained on highly selected data, typically one tissue slide per patient. In reality, especially for large surgical resection specimens, dozens of slides can be available for each patient. Manually sorting and labelling whole-slide images (WSIs) is a very time-consuming process, hindering the direct application of AI on the collected tissue samples from large cohorts. In this study we addressed this issue by developing a deep-learning (DL)-based method for automatic curation of large pathology datasets with several slides per patient.

Methods: We collected multiple large multicentric datasets of colorectal cancer histopathological slides from the United Kingdom (FOXTROT, N = 21,384 slides; CR07, N = 7985 slides) and Germany (DACHS, N = 3606 slides). These datasets contained multiple types of tissue slides, including bowel resection specimens, endoscopic biopsies, lymph node resections, immunohistochemistry-stained slides, and tissue microarrays. We developed, trained, and tested a deep convolutional neural network model to predict the type of slide from the slide overview (thumbnail) image. The primary statistical endpoint was the macro-averaged area under the receiver operating curve (AUROCs) for detection of the type of slide.

Results: In the primary dataset (FOXTROT), with an AUROC of 0.995 [95% confidence interval [CI]: 0.994-0.996] the algorithm achieved a high classification performance and was able to accurately predict the type of slide from the thumbnail image alone. In the two external test cohorts (CR07, DACHS) AUROCs of 0.982 [95% CI: 0.979-0.985] and 0.875 [95% CI: 0.864-0.887] were observed, which indicates the generalizability of the trained model on unseen datasets. With a confidence threshold of 0.95, the model reached an accuracy of 94.6% (7331 classified cases) in CR07 and 85.1% (2752 classified cases) for the DACHS cohort.

Conclusion: Our findings show that using the low-resolution thumbnail image is sufficient to accurately classify the type of slide in digital pathology. This can support researchers to make the vast resource of existing pathology archives accessible to modern AI models with only minimal manual annotations.

Keywords: colorectal cancer; deep learning; digital pathology; quality control.

PubMed Disclaimer

Cited by

Beyond Biomarkers: Machine Learning-Driven Multiomics for Personalized Medicine in Gastric Cancer.
Ma D, Fan C, Sano T, Kawabata K, Nishikubo H, Imanishi D, Sakuma T, Maruo K, Yamamoto Y, Matsuoka T, Yashiro M. Ma D, et al. J Pers Med. 2025 Apr 24;15(5):166. doi: 10.3390/jpm15050166. J Pers Med. 2025. PMID: 40423038 Free PMC article. Review.
Applications of artificial intelligence in digital pathology for gastric cancer.
Chen S, Ding P, Guo H, Meng L, Zhao Q, Li C. Chen S, et al. Front Oncol. 2024 Oct 28;14:1437252. doi: 10.3389/fonc.2024.1437252. eCollection 2024. Front Oncol. 2024. PMID: 39529836 Free PMC article. Review.

References

1. Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 2019; 16; 703–715.
1. Echle A, Rindtorff NT, Brinker TJ, Luedde T, Pearson AT, Kather JN. Deep learning in cancer pathology: a new generation of clinical biomarkers. Br. J. Cancer 2020; 124; 686–696.
1. Shmatko A, Ghaffari Laleh N, Gerstung M, Kather JN. Artificial intelligence in histopathology: enhancing cancer research and clinical oncology. Nat. Can. 2022; 3; 1026–1038.
1. Cifci D, Foersch S, Kather JN. Artificial intelligence to identify genetic alterations in conventional histopathology. J. Pathol. 2022; 257; 430–444.
1. Kleppe A, Skrede O‐J, De Raedt S, Liestøl K, Kerr DJ, Danielsen HE. Designing deep learning studies in cancer diagnostics. Nat. Rev. Cancer 2021; 21; 199–211.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

HORIZON EUROPE European Research Council

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Wiley
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automated curation of large-scale cancer histopathology image datasets using deep learning

Affiliations

Automated curation of large-scale cancer histopathology image datasets using deep learning

Authors

Affiliations

Abstract

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical