LLM-powered TNM staging of neuroendocrine tumors from PET/CT reports
- PMID: 41437331
- PMCID: PMC12838453
- DOI: 10.1186/s12880-025-02092-3
LLM-powered TNM staging of neuroendocrine tumors from PET/CT reports
Abstract
Purpose: Imaging reports are essential for the diagnostic evaluation, treatment planning, and follow-up of patients with neuroendocrine tumors (NETs) of the gastroenteropancreatic (GEP) system. The tumor-node metastasis (TNM) classification is a common model for evaluating the prognostic value of tumor patients. However, their traditional free-text format varies in structure, detail, and clarity, leading to inconsistencies and potential omissions of critical information necessary for optimal patient management. Recent advancements in large language models (LLMs) have created new opportunities for automating complex medical assessments, including the extraction of UICC and ENETS staging classifications from imaging reports. This approach aims to improve standardization, enhance clarity, and ensure consistency, ultimately facilitating more effective multidisciplinary clinical decision-making. This study evaluates whether large language models (LLMs) can infer UICC and ENETS TNM stage for GEP‑NETs from PET/CT free‑text reports that contain descriptive findings only (no explicit TNM labels).
Methods: We evaluated several models, including ChatGPT-4o, DeepSeek V3, Claude 3.5 Sonnet, and Gemini 2.0 Flash, on a physician-generated fictitious dataset of 108 PET/CT reports with expert-annotated TNM classifications according to UICC and ENETS criteria. Model performance was assessed through F1-scores, comparing LLM-generated classifications against human expert benchmarks.
Results: Among the tested models, ChatGPT-4o demonstrated the highest accuracy, achieving microF1 scores of 0.79, 0.99 and 0.99, for T, N and M according to UICC and 0.84, 1.00 and 0.99 respectively, according to ENETS. These results indicate that LLMs have the potential to assist in oncologic staging of NETs, especially offering support for non-specialists in clinical decision-making. However, before integration into routine practice, further prospective validation and rigorous evaluation in real-world settings are necessary.
Conclusion: This study underscores the promise of LLMs in oncologic workflows while highlighting the importance of robust benchmarking and clinical validation.
Supplementary Information: The online version contains supplementary material available at 10.1186/s12880-025-02092-3.
Keywords: Clinical decision support; Large language models; Neuroendocrine tumors; PET/CT; TNM staging.
Conflict of interest statement
Declarations. Ethics approval and consent to participate: This retrospective study was approved by the Ethics Committee of the Technical University of Munich (Approval ID: 2024-590-S-CB). The requirement for individual informed consent was waived by the ethics committee due to the retrospective design and use of fully anonymized clinical data. The study was conducted in accordance with the Declaration of Helsinki and institutional guidelines. Consent to publication: Not applicable Competing interests: The authors declare no competing interests.
Figures
References
LinkOut - more resources
Full Text Sources
