Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 25;14(7):2223.
doi: 10.3390/jcm14072223.

Mapping the Advanced-Stage Epithelial Ovarian Cancer Landscape Goes Beyond Words: Two Large Language Models, Eight Tasks, One Journey

Affiliations

Mapping the Advanced-Stage Epithelial Ovarian Cancer Landscape Goes Beyond Words: Two Large Language Models, Eight Tasks, One Journey

Michela Quaranta et al. J Clin Med. .

Abstract

Background/Objectives: The advancement of natural language processing (NLP) technologies has transformed various sectors. However, their application in the healthcare domain, particularly for analysing clinical notes, remains underdeveloped. We investigated the use of deep neural networks, specifically transformer-based models, to predict intraoperative and post-operative outcomes related to advanced-stage epithelial ovarian cancer cytoreduction (aEOC) using unstructured surgical notes. Methods: We evaluated the performance of RoBERTa, a general-purpose language model, and GatorTron, a domain-specific model, across eight binary classification tasks using the same dataset. The dataset consisted of 560 surgical records from patients with aEOC who underwent cytoreductive surgery at a tertiary UK reference centre. Predictive outcomes were converted into binary features to facilitate classification tasks. To enhance the contextual information available to the models, textual data from "operative findings" and "operative notes" were concatenated. Results: Our findings highlight the tangible benefits of employing domain-specific language models for clinical text analysis. GatorTron generally outperformed RoBERTa across most predictive tasks, underscoring the advantages of domain-specific pretraining for understanding medical terminology and context. Both models struggled to predict certain outcomes, particularly those involving post-operative events like major complications and length of hospital stay, despite adjustments in hyperparameters and training strategies. This limitation suggests that operative text alone may not sufficiently capture the complexities of post-operative recovery. Conclusions: These findings have valuable implications for developing medical AI systems to improve the delivery of modern aEOC healthcare.

Keywords: GatorTron; RoBERTa; epithelial ovarian cancer; natural language processing; operative notes; transfer learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Class distribution of Integer features.
Figure 2
Figure 2
(A) Word count distribution of text variables. The variation reflects the different purposes of the fields. (B) Word cloud visualisation of the concatenated texts extracted from operative notes and operative findings. More frequent terms appear larger.
Figure 3
Figure 3
Radar plots comparing performance between RoBerta and GatorTron models for all examined clinical tasks using Matthew’s correlation coefficient (MCC), recall, precision, F1 score, accuracy, area under precision–recall curve (AURPC), and area under receiver operating characteristic curve (AUROC).

Similar articles

References

    1. Doufekas K., Olaitan A. Clinical epidemiology of epithelial ovarian cancer in the UK. Int. J. Women’S Health. 2014;6:537–545. - PMC - PubMed
    1. du Bois A., Reuss A., Pujade-Lauraine E., Harter P., Ray-Coquard I., Pfisterer J. Role of surgical outcome as prognostic factor in advanced epithelial ovarian cancer: A combined exploratory analysis of 3 prospectively randomized phase 3 multicenter trials. Cancer. 2009;115:1234–1244. doi: 10.1002/cncr.24149. - DOI - PubMed
    1. Chi D.S., Franklin C.C., Levine D.A., Akselrod F., Sabbatini P., Jarnagin W.R., DeMatteo R., Poynor E.A., Abu-Rustum N.R., Barakat R.R. Improved optimal cytoreduction rates for stages IIIC and IV epithelial ovarian, fallopian tube, and primary peritoneal cancer: A change in surgical approach. Gynecol. Oncol. 2004;94:650–654. - PubMed
    1. Dagliati A., Malovini A., Tibollo V., Bellazzi R. Health informatics and EHR to support clinical research in the COVID-19 pandemic: An overview. Brief. Bioinform. 2021;22:812–822. doi: 10.1093/bib/bbaa418. - DOI - PMC - PubMed
    1. Martin-Sanchez F., Verspoor K. Big data in medicine is driving big changes. Yearb. Med. Inform. 2014;23:14–20. - PMC - PubMed

LinkOut - more resources