Weakly supervised text classification on free-text comments in patient-reported outcome measures
- PMID: 40370705
- PMCID: PMC12075198
- DOI: 10.3389/fdgth.2025.1345360
Weakly supervised text classification on free-text comments in patient-reported outcome measures
Abstract
Background: Free-text comments in patient-reported outcome measures (PROMs) data provide insights into health-related quality of life (HRQoL). However, these comments are typically analysed using manual methods, such as content analysis, which is labour-intensive and time-consuming. Machine learning analysis methods are largely unsupervised, necessitating post-analysis interpretation. Weakly supervised text classification (WSTC) can be a valuable analytical method of analysis for classifying domain-specific text data, especially when limited labelled data are available. In this paper, we applied five WSTC techniques to PROMs comment data to explore the extent to which they can be used to identify HRQoL themes reported by patients with prostate and colorectal cancer.
Methods: The main HRQoL themes and associated keywords were identified from a scoping review. They were used to classify PROMs comments with these themes from two national PROMs datasets: colorectal cancer (n = 5,634) and prostate cancer (n = 59,768). Classification was done using five keyword-based WSTC methods (anchored CorEx, BERTopic, Guided LDA, WeSTClass, and X-Class). To evaluate these methods, we assessed the overall performance of the methods and by theme. Domain experts reviewed the interpretability of the methods using the keywords extracted from the methods during training.
Results: Based on the 12 papers identified in the scoping review, we determined six main themes and corresponding keywords to label PROMs comments using WSTC methods. These themes were: Comorbidities, Daily Life, Health Pathways and Services, Physical Function, Psychological and Emotional Function, and Social Function. The performance of the methods varied across themes and between the datasets. While the best-performing model for both datasets, CorEx, attained weighted F1 scores of 0.57 (colorectal cancer) and 0.61 (prostate cancer), methods achieved an F1 score of up to 0.92 (Social Function) on individual themes. By evaluating the keywords extracted from the trained models, we saw that the methods that can utilise expert-driven seed terms and extrapolate based on limited data performed the best.
Conclusions: Overall, evaluating these WSTC methods provided insight into their applicability for analysing PROMs comments. Evaluating the classification performance illustrated the potential and limitations of keyword-based WSTC in labelling PROMs comments when labelled data are limited.
Keywords: PROMS; free-text; natural language processing; patient-generated data; patient-reported data; short text; text classification; weakly supervised.
© 2025 Linton, Dimitrova, Downing, Wagland and Glaser.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures



Similar articles
-
Development and testing of a text-mining approach to analyse patients' comments on their experiences of colorectal cancer care.BMJ Qual Saf. 2016 Aug;25(8):604-14. doi: 10.1136/bmjqs-2015-004063. Epub 2015 Oct 28. BMJ Qual Saf. 2016. PMID: 26512131
-
Automated analysis of free-text comments and dashboard representations in patient experience surveys: a multimethod co-design study.Southampton (UK): NIHR Journals Library; 2019 Jul. Southampton (UK): NIHR Journals Library; 2019 Jul. PMID: 31287638 Free Books & Documents. Review.
-
Computer-assisted textual analysis of free-text comments in the Swiss Cancer Patient Experiences (SCAPE) survey.BMC Health Serv Res. 2020 Nov 10;20(1):1029. doi: 10.1186/s12913-020-05873-4. BMC Health Serv Res. 2020. PMID: 33172451 Free PMC article.
-
Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review.BMJ Health Care Inform. 2021 Mar;28(1):e100262. doi: 10.1136/bmjhci-2020-100262. BMJ Health Care Inform. 2021. PMID: 33653690 Free PMC article.
-
Exploring patient experiences of cancer care in Northern Ireland: A thematic analysis of free-text responses to the 2018 Northern Ireland Patient Experience Survey (NICPES).BMC Health Serv Res. 2021 Jun 7;21(1):564. doi: 10.1186/s12913-021-06577-z. BMC Health Serv Res. 2021. PMID: 34098944 Free PMC article.
Cited by
-
Identifying Patient-Reported Care Experiences in Free-Text Survey Comments: Topic Modeling Study.JMIR Med Inform. 2025 Feb 24;13:e63466. doi: 10.2196/63466. JMIR Med Inform. 2025. PMID: 39993226 Free PMC article.
References
-
- Hajdarevic S, Rasmussen BH, Fransson P. You need to know more to understand my scoring on the survey: free-text comments as part of a PROM-survey of men with prostate cancer. Open J Nurs. (2016) 6:365–75. 10.4236/ojn.2016.65038 - DOI
-
- Kotronoulas G, Papadopoulou C, MacNicol L, Simpson M, Maguire R. Feasibility and acceptability of the use of patient-reported outcome measures (PROMs) in the delivery of nurse-led supportive care to people with colorectal cancer. Eur J Oncol Nurs. (2017) 29:115–24. 10.1016/j.ejon.2017.06.002 - DOI - PubMed
LinkOut - more resources
Full Text Sources