Large language model trained on clinical oncology data predicts cancer progression
- PMID: 40604229
- PMCID: PMC12223279
- DOI: 10.1038/s41746-025-01780-2
Large language model trained on clinical oncology data predicts cancer progression
Abstract
Subspecialty knowledge barriers have limited the adoption of large language models (LLMs) in oncology. We introduce Woollie, an open-source, oncology-specific LLM trained on real-world data from Memorial Sloan Kettering Cancer Center (MSK) across lung, breast, prostate, pancreatic, and colorectal cancers, with external validation using University of California, San Francisco (UCSF) data. Woollie surpasses ChatGPT in medical benchmarks and excels in eight non-medical benchmarks. Analyzing 39,319 radiology impression notes from 4002 patients, it achieved an overall area under the receiver operating characteristic curve (AUROC) of 0.97 for cancer progression prediction on MSK data, including a notable 0.98 AUROC for pancreatic cancer. On UCSF data, it achieved an overall AUROC of 0.88, excelling in lung cancer detection with an AUROC of 0.95. As the first oncology specific LLM validated across institutions, Woollie demonstrates high accuracy and consistency across cancer types, underscoring its potential to enhance cancer progression analysis.
© 2025. The Author(s).
Conflict of interest statement
Competing interests: The authors declare no competing interests.
Figures






Similar articles
-
Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.JMIR Med Inform. 2025 Jun 11;13:e70924. doi: 10.2196/70924. JMIR Med Inform. 2025. PMID: 40498674 Free PMC article.
-
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638. J Med Internet Res. 2025. PMID: 40499132 Free PMC article.
-
Predicting 30-Day Postoperative Mortality and American Society of Anesthesiologists Physical Status Using Retrieval-Augmented Large Language Models: Development and Validation Study.J Med Internet Res. 2025 Jun 3;27:e75052. doi: 10.2196/75052. J Med Internet Res. 2025. PMID: 40460423 Free PMC article.
-
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340. Health Technol Assess. 2006. PMID: 16959170
-
Targeted therapy for advanced anaplastic lymphoma kinase (<I>ALK</I>)-rearranged non-small cell lung cancer.Cochrane Database Syst Rev. 2022 Jan 7;1(1):CD013453. doi: 10.1002/14651858.CD013453.pub2. Cochrane Database Syst Rev. 2022. PMID: 34994987 Free PMC article.
Cited by
-
Large language models for clinical decision support in gastroenterology and hepatology.Nat Rev Gastroenterol Hepatol. 2025 Aug 22. doi: 10.1038/s41575-025-01108-1. Online ahead of print. Nat Rev Gastroenterol Hepatol. 2025. PMID: 40846793 Review.
-
Incorporating large language models as clinical decision support in oncology: the Woollie model.NPJ Digit Med. 2025 Aug 18;8(1):529. doi: 10.1038/s41746-025-01941-3. NPJ Digit Med. 2025. PMID: 40825846 Free PMC article.
References
-
- Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst.35, 27730–27744 (2022).
-
- OpenAi. GPT-4 technical report (OpenAi, 2023).
-
- Eloundou, T., Manning, S., Mishkin, P. & Rock, D. Gpts are gpts: an early look at the labor market impact potential of large language models. Preprint at https://arxiv.org/abs/2303.10130 (2023).
-
- Will ChatGPT transform healthcare? Nat. Med.29, 505–506 (2023) - PubMed
-
- Nori, H., King, N., McKinney, S. M., Carignan, D. & Horvitz, E. Capabilities of gpt-4 on medical challenge problems. Preprint at https://arxiv.org/abs/2303.13375 (2023).
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials