Pan-tumor T-lymphocyte detection using deep neural networks: Recommendations for transfer learning in immunohistochemistry

Frauke Wilm^{1

2

3}, Christian Ihling², Gábor Méhes⁴, Luigi Terracciano⁵, Chloé Puget⁶, Robert Klopfleisch⁶, Peter Schüffler^{7

8}, Marc Aubreville⁹, Andreas Maier¹, Thomas Mrowiec², Katharina Breininger³

Affiliations

¹ Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
² Merck Healthcare KGaA, Darmstadt, Germany.
³ Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
⁴ Department of Pathology, University of Debrecen, Debrecen, Hungary.
⁵ Research Department Pathology, Universitätsspital Basel, Basel, Switzerland.
⁶ Institute of Veterinary Pathology, Freie Universität Berlin, Berlin, Germany.
⁷ Institute of General and Surgical Pathology, Technical University of Munich, Munich, Germany.
⁸ School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
⁹ Technische Hochschule Ingolstadt, Ingolstadt, Germany.

PMID: 36994311
PMCID: PMC10040882
DOI: 10.1016/j.jpi.2023.100301

Pan-tumor T-lymphocyte detection using deep neural networks: Recommendations for transfer learning in immunohistochemistry

Frauke Wilm et al. J Pathol Inform. 2023.

. 2023 Feb 27:14:100301.

doi: 10.1016/j.jpi.2023.100301. eCollection 2023.

Authors

Affiliations

¹ Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
² Merck Healthcare KGaA, Darmstadt, Germany.
³ Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
⁴ Department of Pathology, University of Debrecen, Debrecen, Hungary.
⁵ Research Department Pathology, Universitätsspital Basel, Basel, Switzerland.
⁶ Institute of Veterinary Pathology, Freie Universität Berlin, Berlin, Germany.
⁷ Institute of General and Surgical Pathology, Technical University of Munich, Munich, Germany.
⁸ School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
⁹ Technische Hochschule Ingolstadt, Ingolstadt, Germany.

PMID: 36994311
PMCID: PMC10040882
DOI: 10.1016/j.jpi.2023.100301

Erratum in

Erratum Regarding Previously Published Articles.
[No authors listed] [No authors listed] J Pathol Inform. 2024 Mar 30;15:100365. doi: 10.1016/j.jpi.2024.100365. eCollection 2024 Dec. J Pathol Inform. 2024. PMID: 39712975 Free PMC article.

Abstract

The success of immuno-oncology treatments promises long-term cancer remission for an increasing number of patients. The response to checkpoint inhibitor drugs has shown a correlation with the presence of immune cells in the tumor and tumor microenvironment. An in-depth understanding of the spatial localization of immune cells is therefore critical for understanding the tumor's immune landscape and predicting drug response. Computer-aided systems are well suited for efficiently quantifying immune cells in their spatial context. Conventional image analysis approaches are often based on color features and therefore require a high level of manual interaction. More robust image analysis methods based on deep learning are expected to decrease this reliance on human interaction and improve the reproducibility of immune cell scoring. However, these methods require sufficient training data and previous work has reported low robustness of these algorithms when they are tested on out-of-distribution data from different pathology labs or samples from different organs. In this work, we used a new image analysis pipeline to explicitly evaluate the robustness of marker-labeled lymphocyte quantification algorithms depending on the number of training samples before and after being transferred to a new tumor indication. For these experiments, we adapted the RetinaNet architecture for the task of T-lymphocyte detection and employed transfer learning to bridge the domain gap between tumor indications and reduce the annotation costs for unseen domains. On our test set, we achieved human-level performance for almost all tumor indications with an average precision of 0.74 in-domain and 0.72-0.74 cross-domain. From our results, we derive recommendations for model development regarding annotation extent, training sample selection, and label extraction for the development of robust algorithms for immune cell scoring. By extending the task of marker-labeled lymphocyte quantification to a multi-class detection task, the pre-requisite for subsequent analyses, e.g., distinguishing lymphocytes in the tumor stroma from tumor-infiltrating lymphocytes, is met.

Keywords: Deep learning; Domain adaptation; Immuno-oncology; Immunohistochemistry; Transfer learning; Tumor-infiltrating lymphocytes.

PubMed Disclaimer

Figures

**Fig. 1**
Annotation and training pipeline. ROIs sized 2 mm² are annotated using commercially available image analysis software.

**Fig. 2**
Kappa variants per tumor indication. Annotators A, B, and C are compared to each other (hollow symbols) and to semi-automatic annotations (filled symbols) using the image analysis software (SW). The pathologist consensus is also compared to the software annotations (star).

**Fig. 3**
Examples for consensus annotations (orange: tumor cells, green: CD3⁺ cells, blue: non-specified cells, purple: cells without agreement). HNSCC: head and neck squamous cell carcinoma, NSCLC: non-small cell lung cancer, TNBC: triple-negative breast cancer, GC: gastric cancer.

**Fig. 4**
Average precision (AP) when deploying the models on test data from the source domain (HNSCC) and unseen target domains. The models were trained on field of views from an increasing number of whole slide images. The error bars visualize the standard deviation of the 5 repetitions and the curve fits a logistic regression. HNSCC: head and neck squamous cell carcinoma, NSCLC: non-small cell lung cancer, TNBC: triple-negative breast cancer, GC: gastric cancer.

**Fig. 5**
Kappa variants per tumor indication. Annotators A, B, and C are compared to each other (hollow symbols) and to the detection results of RetinaNet₂₂ (asterisks).

**Fig. 6**
Comparison of consensus annotations in upper row vs. network predictions in the bottom row (orange: tumor cells, green: CD3⁺ cells, blue: non-specified cells, purple: cells without agreement). HNSCC: head and neck squamous cell carcinoma, NSCLC: non-small cell lung cancer, TNBC: triple-negative breast cancer, GC: gastric cancer.

**Fig. 7**
Examples were the models faced difficulties to differentiate tumor and non-tumor cells (orange: tumor cells, green: CD3⁺ cells, blue: non-specified cells).

**Fig. 8**
Difference in average precision (AP) for model deployment vs. fine-tuning with 1 target domain image. The error bars visualize the minimum to maximum range of the performance difference. HNSCC: head and neck squamous cell carcinoma, NSCLC: non-small cell lung cancer, TNBC: triple-negative breast cancer, GC: gastric cancer.

**Fig. 9**
Average performance of benchmark models when being tested on all tumor indications. Matrix entry m_i,j is the average precision (AP) when training on the indication in row i and testing on the indication in column j. Diagonal elements indicate in-domain performance, whereas off-diagonal elements represent cross-domain performance. HNSCC: head and neck squamous cell carcinoma, NSCLC: non-small cell lung cancer, TNBC: triple-negative breast cancer, GC: gastric cancer.

**Fig. B.1**
Detection results for common tissue and staining artifacts and morphologically challenging regions (orange: tumor cells, green: CD3⁺ cells, blue: non-specified cells).

See this image and copyright information in PMC

References

1. Klauschen F., Müller K.-R., Binder A., et al. Semin Cancer Biol. Vol. 52. Elsevier; 2018. Scoring of tumor-infiltrating lymphocytes: from visual estimation to machine learning; pp. 151–157. - PubMed
1. Ramos-Vara J. Technical aspects of immunohistochemistry. Vet Pathol. 2005;42(4):405–426. - PubMed
1. Priego-Torres B.M., Lobato-Delgado B., Atienza-Cuevas L., Sanchez-Morillo D. Deep learning-based instance segmentation for the precise automated quantification of digital breast cancer immunohistochemistry images. Expert Syst Appl. 2022;193:116471.
1. Garcia E., Hermoza R., Castanon C.B., Cano L., Castillo M., Castanneda C. Proc IEEE Int Symp Comput Based Med Syst. IEEE; 2017. Automatic lymphocyte detection on gastric cancer IHC images using deep learning; pp. 200–204.
1. Chen T., Chefd’hotel C. Mach Learn Med Imaging. Springer; 2014. Deep learning based automatic immune cell detection for immunohistochemistry images; pp. 17–24.

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Pan-tumor T-lymphocyte detection using deep neural networks: Recommendations for transfer learning in immunohistochemistry

Affiliations

Pan-tumor T-lymphocyte detection using deep neural networks: Recommendations for transfer learning in immunohistochemistry

Authors

Affiliations

Erratum in

Abstract

Figures

References

LinkOut - more resources

Full Text Sources

Miscellaneous