Weakly supervised text classification on free-text comments in patient-reported outcome measures

Anna-Grace Linton¹, Vania Gatseva Dimitrova², Amy Downing³, Richard Wagland⁴, Adam W Glaser^{3

5}

Affiliations

¹ UKRI CDT in AI for Medical Diagnosis and Care, University of Leeds, Leeds, United Kingdom.
² School of Computing, University of Leeds, Leeds, United Kingdom.
³ School of Medicine, University of Leeds, Leeds, United Kingdom.
⁴ School of Health Sciences, University of Southampton, Southampton, United Kingdom.
⁵ Leeds Institute of Medical Research, University of Leeds, Leeds, United Kingdom.

PMID: 40370705
PMCID: PMC12075198
DOI: 10.3389/fdgth.2025.1345360

Weakly supervised text classification on free-text comments in patient-reported outcome measures

Anna-Grace Linton et al. Front Digit Health. 2025.

. 2025 Apr 30:7:1345360.

doi: 10.3389/fdgth.2025.1345360. eCollection 2025.

Authors

Anna-Grace Linton¹, Vania Gatseva Dimitrova², Amy Downing³, Richard Wagland⁴, Adam W Glaser^{3

5}

Affiliations

¹ UKRI CDT in AI for Medical Diagnosis and Care, University of Leeds, Leeds, United Kingdom.
² School of Computing, University of Leeds, Leeds, United Kingdom.
³ School of Medicine, University of Leeds, Leeds, United Kingdom.
⁴ School of Health Sciences, University of Southampton, Southampton, United Kingdom.
⁵ Leeds Institute of Medical Research, University of Leeds, Leeds, United Kingdom.

PMID: 40370705
PMCID: PMC12075198
DOI: 10.3389/fdgth.2025.1345360

Abstract

Background: Free-text comments in patient-reported outcome measures (PROMs) data provide insights into health-related quality of life (HRQoL). However, these comments are typically analysed using manual methods, such as content analysis, which is labour-intensive and time-consuming. Machine learning analysis methods are largely unsupervised, necessitating post-analysis interpretation. Weakly supervised text classification (WSTC) can be a valuable analytical method of analysis for classifying domain-specific text data, especially when limited labelled data are available. In this paper, we applied five WSTC techniques to PROMs comment data to explore the extent to which they can be used to identify HRQoL themes reported by patients with prostate and colorectal cancer.

Methods: The main HRQoL themes and associated keywords were identified from a scoping review. They were used to classify PROMs comments with these themes from two national PROMs datasets: colorectal cancer (n = 5,634) and prostate cancer (n = 59,768). Classification was done using five keyword-based WSTC methods (anchored CorEx, BERTopic, Guided LDA, WeSTClass, and X-Class). To evaluate these methods, we assessed the overall performance of the methods and by theme. Domain experts reviewed the interpretability of the methods using the keywords extracted from the methods during training.

Results: Based on the 12 papers identified in the scoping review, we determined six main themes and corresponding keywords to label PROMs comments using WSTC methods. These themes were: Comorbidities, Daily Life, Health Pathways and Services, Physical Function, Psychological and Emotional Function, and Social Function. The performance of the methods varied across themes and between the datasets. While the best-performing model for both datasets, CorEx, attained weighted F1 scores of 0.57 (colorectal cancer) and 0.61 (prostate cancer), methods achieved an F1 score of up to 0.92 (Social Function) on individual themes. By evaluating the keywords extracted from the trained models, we saw that the methods that can utilise expert-driven seed terms and extrapolate based on limited data performed the best.

Conclusions: Overall, evaluating these WSTC methods provided insight into their applicability for analysing PROMs comments. Evaluating the classification performance illustrated the potential and limitations of keyword-based WSTC in labelling PROMs comments when labelled data are limited.

Keywords: PROMS; free-text; natural language processing; patient-generated data; patient-reported data; short text; text classification; weakly supervised.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Framework for the automated analysis of free-text comments in PROMs. The themes to identify in the PROMs comments are selected using themes found in a scoping review and refined by domain experts. The performance of five keyword-based WSTC methods is evaluated on colorectal cancer (CC) and prostate cancer (PC) PROMs comment datasets.

**Figure 2**
Study selection flowchart. Studies reporting the themes identified in PROMs comments by patients with chronic conditions were searched. From the studies, the reported themes were extracted as reported and grouped based on prevalence and similarity.

**Figure 3**
Performance of methods on PC and CC. The F1 score, recall, precision, and accuracy of each method for each theme are presented.

See this image and copyright information in PMC

References

1. Weldring T, Smith SM. Patient-reported outcomes (PROs) and patient-reported outcome measures (PROMs). Health Serv Insights. (2013) 6:61–8. 10.4137/HSI.S11093 - DOI - PMC - PubMed
1. Fiteni F, Cuenant A, Favier M, Cousin C, Houede N. Clinical relevance of routine monitoring of patient-reported outcomes versus clinician-reported outcomes in oncology. In Vivo. (2019) 33:17–21. 10.21873/invivo.11433 - DOI - PMC - PubMed
1. Doward LC, Gnanasakthy A, Baker MG. Patient reported outcomes: looking beyond the label claim. Health Qual Life Outcomes. (2010) 8:89. 10.1186/1477-7525-8-89 - DOI - PMC - PubMed
1. Hajdarevic S, Rasmussen BH, Fransson P. You need to know more to understand my scoring on the survey: free-text comments as part of a PROM-survey of men with prostate cancer. Open J Nurs. (2016) 6:365–75. 10.4236/ojn.2016.65038 - DOI
1. Kotronoulas G, Papadopoulou C, MacNicol L, Simpson M, Maguire R. Feasibility and acceptability of the use of patient-reported outcome measures (PROMs) in the delivery of nurse-led supportive care to people with colorectal cancer. Eur J Oncol Nurs. (2017) 29:115–24. 10.1016/j.ejon.2017.06.002 - DOI - PubMed

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Weakly supervised text classification on free-text comments in patient-reported outcome measures

Affiliations

Weakly supervised text classification on free-text comments in patient-reported outcome measures

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources