Developing a named entity framework for thyroid cancer staging and risk level classification using large language models

Matrix M H Fung^#¹, Eric H M Tang^#^{2

3}, Tingting Wu^#², Yan Luk¹, Ivan C H Au⁴, Xiaodong Liu^{1

2}, Victor H F Lee⁵, Chun Ka Wong⁶, Zhili Wei², Wing Yiu Cheng², Isaac C Y Tai⁷, Joshua W K Ho^{2

8}, Jason W H Wong⁸, Brian H H Lang¹, Kathy S M Leung^{2

9

10

11}, Zoie S Y Wong^{12

13

14

4}, Joseph T Wu^{15

16

17

18}, Carlos K H Wong^{19

20

21

22}

Affiliations

¹ Division of Endocrine Surgery, Department of Surgery, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
² Laboratory of Data Discovery for Health (D²4H), Hong Kong Science Park, Hong Kong SAR, China.
³ Department of Family Medicine and Primary Care, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁴ School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁵ Department of Clinical Oncology, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁶ Department of Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁷ Department of Orthopaedics and Traumatology, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁸ School of Biomedical Science, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁹ The Hong Kong Jockey Club Global Health Institute, Hong Kong SAR, China.
¹⁰ WHO Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
¹¹ The University of Hong Kong-Shenzhen Hospital, Shenzhen, China.
¹² The Kirby Institute, University of New South Wales, Sydney, Australia.
¹³ Biomedical Informatics and Digital Health, School of Medical Sciences, The University of Sydney, Sydney, Australia.
¹⁴ Graduate School of Public Health, St. Luke's International University, Tokyo, Japan.
¹⁵ Laboratory of Data Discovery for Health (D²4H), Hong Kong Science Park, Hong Kong SAR, China. joewu@hku.hk.
¹⁶ The Hong Kong Jockey Club Global Health Institute, Hong Kong SAR, China. joewu@hku.hk.
¹⁷ WHO Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China. joewu@hku.hk.
¹⁸ The University of Hong Kong-Shenzhen Hospital, Shenzhen, China. joewu@hku.hk.
¹⁹ Laboratory of Data Discovery for Health (D²4H), Hong Kong Science Park, Hong Kong SAR, China. carlosho@hku.hk.
²⁰ Department of Family Medicine and Primary Care, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China. carlosho@hku.hk.
²¹ The Hong Kong Jockey Club Global Health Institute, Hong Kong SAR, China. carlosho@hku.hk.
²² Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK. carlosho@hku.hk.

^# Contributed equally.

PMID: 40025285
PMCID: PMC11873034
DOI: 10.1038/s41746-025-01528-y

Developing a named entity framework for thyroid cancer staging and risk level classification using large language models

Matrix M H Fung et al. NPJ Digit Med. 2025.

. 2025 Mar 1;8(1):134.

doi: 10.1038/s41746-025-01528-y.

Authors

Affiliations

¹ Division of Endocrine Surgery, Department of Surgery, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
² Laboratory of Data Discovery for Health (D²4H), Hong Kong Science Park, Hong Kong SAR, China.
³ Department of Family Medicine and Primary Care, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁴ School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁵ Department of Clinical Oncology, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁶ Department of Medicine, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁷ Department of Orthopaedics and Traumatology, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁸ School of Biomedical Science, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
⁹ The Hong Kong Jockey Club Global Health Institute, Hong Kong SAR, China.
¹⁰ WHO Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China.
¹¹ The University of Hong Kong-Shenzhen Hospital, Shenzhen, China.
¹² The Kirby Institute, University of New South Wales, Sydney, Australia.
¹³ Biomedical Informatics and Digital Health, School of Medical Sciences, The University of Sydney, Sydney, Australia.
¹⁴ Graduate School of Public Health, St. Luke's International University, Tokyo, Japan.
¹⁵ Laboratory of Data Discovery for Health (D²4H), Hong Kong Science Park, Hong Kong SAR, China. joewu@hku.hk.
¹⁶ The Hong Kong Jockey Club Global Health Institute, Hong Kong SAR, China. joewu@hku.hk.
¹⁷ WHO Collaborating Centre for Infectious Disease Epidemiology and Control, School of Public Health, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China. joewu@hku.hk.
¹⁸ The University of Hong Kong-Shenzhen Hospital, Shenzhen, China. joewu@hku.hk.
¹⁹ Laboratory of Data Discovery for Health (D²4H), Hong Kong Science Park, Hong Kong SAR, China. carlosho@hku.hk.
²⁰ Department of Family Medicine and Primary Care, School of Clinical Medicine, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China. carlosho@hku.hk.
²¹ The Hong Kong Jockey Club Global Health Institute, Hong Kong SAR, China. carlosho@hku.hk.
²² Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK. carlosho@hku.hk.

^# Contributed equally.

PMID: 40025285
PMCID: PMC11873034
DOI: 10.1038/s41746-025-01528-y

Abstract

We developed a named entity (NE) framework for information extraction from semi-structured clinical notes retrieved from The Cancer Genome Atlas-Thyroid Cancer (TCGA-THCA) database and examined Large Language Models (LLMs) strategies to classify the 8^th edition of American Joint Committee on Cancer (AJCC) staging and American Thyroid Association (ATA) risk category for patients with well-differentiated thyroid cancer. The NE framework consisted of annotation guidelines development, ground truth labelling, prompting approaches, and evaluation codes. Four LLMs (Mistral-7B-Instruct, Llama-3.1-8B-Instruct, Gemma-2-9B-Instruct, and Qwen2.5-7B-Instruct) were offline utilised for information extraction, comparing with expert-curated ground truth. Our framework was developed using 50 TCGA-THCA pathology notes. 289 TCGA-THCA notes and 35 pseudo-clinical cases were used for validation. Taking an ensemble-like majority-vote strategy achieved satisfactory performance for AJCC and ATA in both development and validation sets. Our framework and ensemble classifier optimised efficiency and accuracy of classifying stage and risk category in thyroid cancer patients.

PubMed Disclaimer

Conflict of interest statement

Competing interests: Z.W. is contributing to npj Digital Medicine as an Associate Editor and Guest Editor for the Collection on Natural Language Processing in Clinical Medicine. Other authors declared no competing interests.

Figures

**Fig. 1. Flowchart of patient selection process.**
Flowchart depicting patient selection and the data source used as development set and validation set. Cancer stages and ATA risks of all TCGA-THCA patients and pseudo cases were verified by endocrine surgeons. A pseudo case of non-invasive follicular thyroid neoplasm with papillary like nuclear features is not grade with AJCC staging and ATA risk.

**Fig. 2. Flow of data extraction using LLMs and classifying ATA risk and AJCC staging from the LLM output.**
Schematic diagram depicting the flow of data extraction using LLMs and the utilization of self-developed Microsoft Excel template for data cleaning and classification.

**Fig. 3. Heatmap of performance of Large Language Models on classification of ATA risks and AJCC staging in 50 TCGA pathology reports for NE framework development.**
LLMs with various prompting strategies attained satisfactory performance in NE framework development. a Performance on ATA risk classification with F1-scores 88.0–100.0%. b Performance on AJCC staging with F1-scores of 90.3–100.0%.

**Fig. 4. Heatmap of performance of ensemble classifiers on classification of ATA risks and AJCC staging in the development and validation sets.**
Ensemble classifiers attained satisfactory performance on the two datasets. a Performance on ATA risk classification with F1-scores at least 88.5%. b Performance on AJCC staging with F1-scores of at least 90.4%.

**Fig. 5. Heatmap of performance of Large Language Models on classification of ATA risks and AJCC staging in 289 TCGA pathology reports for validation.**
LLMs with various prompting strategies attained satisfactory performance in 289 TCGA pathology reports for validation. a Performance on ATA risk classification. with F1-scores 88.5–96.5%. b Performance on AJCC staging with F1-scores 94.2–99.7%.

**Fig. 6. Heatmap of performance of Large Language Models on classification of ATA risks and AJCC staging in 35 pseudo cases for validation.**
The performance of LLMs various in different approach and in individual LLM in the 35 pseudo cases for validation. a Performance on ATA risk classification. Mistral-7B-Instruct-v0.3 outperformed other LLMs with F1-score of 94.3%. b Performance on AJCC staging. Llama-3.1-8B-Instruct outperformed other LLMs with F1-score of 97.5%.

See this image and copyright information in PMC

References

1. Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin.73, 17–48 (2023). - PubMed
1. World Health Organization. Age-Standardized Rate (World) per 100 000, Incidence and Mortality, Both sexes, in 2022. 2024 [cited Aug 2, 2024]Available from: https://gco.iarc.fr/today/en/dataviz/bars?types=0_1&mode=cancer&group_po....
1. Boucai, L., Zafereo, M. & Cabanillas, M. E. Thyroid cancer: A review. JAMA331, 425–435 (2024). - PubMed
1. Liu, Y. et al. Radioiodine therapy in advanced differentiated thyroid cancer: Resistance and overcoming strategy. Drug Resist Updat.68, 100939 (2023). - PubMed
1. Haugen, B. R. et al. 2015 American Thyroid Association Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer: The American Thyroid Association Guidelines Task Force on Thyroid Nodules and Differentiated Thyroid Cancer. Thyroid26, 1–133 (2016). - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Developing a named entity framework for thyroid cancer staging and risk level classification using large language models

Affiliations

Developing a named entity framework for thyroid cancer staging and risk level classification using large language models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials