LLM-powered breast cancer staging from PET/CT reports: a comparative performance study

Daniel Spitzl¹, Markus Mergen², Rickmer Braren³, Lukas Endrös², Matthias Eiber⁴, Lisa Steinhelfer⁵

Affiliations

¹ Department of Diagnostic and Interventional Radiology, TUM University Hospital, School of Medicine, Technical University of Munich, Munich, Germany. Electronic address: danieljan.spitzl@mri.tum.de.
² Department of Diagnostic and Interventional Radiology, TUM University Hospital, School of Medicine, Technical University of Munich, Munich, Germany.
³ Department of Diagnostic and Interventional Radiology, TUM University Hospital, School of Medicine, Technical University of Munich, Munich, Germany; German Cancer Consortium (DKTK), Partner-site Munich, a Partnership between DKFZ and Klinikum rechts der Isar, Munich, Germany; Bavarian Cancer Research Center (BZKF), Munich, Germany.
⁴ Department of Nuclear Medicine, TUM University Hospital, School of Medicine and Health, Technical University Munich, Munich, Germany; German Cancer Consortium (DKTK), Partner-site Munich, a Partnership between DKFZ and Klinikum rechts der Isar, Munich, Germany; Bavarian Cancer Research Center (BZKF), Munich, Germany.
⁵ Department of Nuclear Medicine, TUM University Hospital, School of Medicine and Health, Technical University Munich, Munich, Germany; Technical University of Munich, School of Medicine and Health, Department of Diagnostic and Interventional Neuroradiology, TUM University Hospital, Munich, Germany.

PMID: 40706196
DOI: 10.1016/j.ijmedinf.2025.106053

Free article

Comparative Study

LLM-powered breast cancer staging from PET/CT reports: a comparative performance study

Daniel Spitzl et al. Int J Med Inform. 2025 Dec.

Free article

. 2025 Dec:204:106053.

doi: 10.1016/j.ijmedinf.2025.106053. Epub 2025 Jul 19.

Authors

Daniel Spitzl¹, Markus Mergen², Rickmer Braren³, Lukas Endrös², Matthias Eiber⁴, Lisa Steinhelfer⁵

Affiliations

¹ Department of Diagnostic and Interventional Radiology, TUM University Hospital, School of Medicine, Technical University of Munich, Munich, Germany. Electronic address: danieljan.spitzl@mri.tum.de.
² Department of Diagnostic and Interventional Radiology, TUM University Hospital, School of Medicine, Technical University of Munich, Munich, Germany.
³ Department of Diagnostic and Interventional Radiology, TUM University Hospital, School of Medicine, Technical University of Munich, Munich, Germany; German Cancer Consortium (DKTK), Partner-site Munich, a Partnership between DKFZ and Klinikum rechts der Isar, Munich, Germany; Bavarian Cancer Research Center (BZKF), Munich, Germany.
⁴ Department of Nuclear Medicine, TUM University Hospital, School of Medicine and Health, Technical University Munich, Munich, Germany; German Cancer Consortium (DKTK), Partner-site Munich, a Partnership between DKFZ and Klinikum rechts der Isar, Munich, Germany; Bavarian Cancer Research Center (BZKF), Munich, Germany.
⁵ Department of Nuclear Medicine, TUM University Hospital, School of Medicine and Health, Technical University Munich, Munich, Germany; Technical University of Munich, School of Medicine and Health, Department of Diagnostic and Interventional Neuroradiology, TUM University Hospital, Munich, Germany.

PMID: 40706196
DOI: 10.1016/j.ijmedinf.2025.106053

Abstract

Purpose: Imaging reports are crucial in breast cancer management, with the tumor-node-metastasis (TNM) classification serving as a widely used model for assessing disease severity, guiding treatment decisions, and predicting patient outcomes. Large language models (LLMs) offer a potential solution by extracting standardized UICC TNM classifications and the corresponding UICC stage directly from existing PET/CT reports. This approach holds promise to enhance staging accuracy, streamline multidisciplinary discussions, and improve patient outcomes.

Methods: Here, we evaluated four LLMs-ChatGPT-4o, DeepSeek V3, Claude 3.5 Sonnet, and Gemini 2.0 Flash-for their capacity to determine TNM staging based on UICC/AJCC breast cancer guidelines. A total of 111 fictitious PET/CT reports were analyzed, and each model's outputs were measured against expert-generated TNM classifications and stage categorizations.

Results: Among the tested models, Claude 3.5 Sonnet demonstrated superior F1 scores of 0.95%, 0.95%, 1.00% and 0.92% for T, N, M classification and UICC stage classification, respectively.

Conclusions: These findings underscore the ability of advanced natural language processing (NLP) technologies to support reliable cancer staging, potentially aiding clinicians. Despite the encouraging performance, prospective clinical trials and validation across diverse practice settings remain critical to confirming these preliminary outcomes. Nonetheless, this study highlights the promise of LLM-based systems in reinforcing the accuracy of oncologic workflows and lays the groundwork for broader adoption of AI-driven tools in breast cancer management.

Keywords: Artificial intelligence; Breast cancer; Clinical decision support; Diagnostics.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

LLM-powered breast cancer staging from PET/CT reports: a comparative performance study

Affiliations

LLM-powered breast cancer staging from PET/CT reports: a comparative performance study

Authors

Affiliations

Abstract

Conflict of interest statement

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical