Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 6:27:e74231.
doi: 10.2196/74231.

Classifying Patient Complaints Using Artificial Intelligence-Powered Large Language Models: Cross-Sectional Study

Affiliations

Classifying Patient Complaints Using Artificial Intelligence-Powered Large Language Models: Cross-Sectional Study

Sky Wei Chee Koh et al. J Med Internet Res. .

Abstract

Background: Patient complaints provide valuable insights into the performance of health care systems, highlighting potential risks not apparent to staff. Patient complaints can drive systemic changes that enhance patient safety. However, manual categorization and analysis pose a huge logistical challenge, hindering the ability to harness the potential of these data.

Objective: This study aims to evaluate the accuracy of artificial intelligence (AI)-powered categorization of patient complaints in primary care based on the Healthcare Complaint Analysis Tool (HCAT) General Practice (GP) taxonomy and assess the importance of advanced large language models (LLMs) in complaint categorization.

Methods: This cross-sectional study analyzed 1816 anonymous patient complaints from 7 public primary care clinics in Singapore. Complaints were first coded by trained human coders using the HCAT (GP) taxonomy through a rigorous process involving independent assessment and consensus discussions. LLMs (GPT-3.5 turbo, GPT-4o mini, and Claude 3.5 Sonnet) were used to validate manual classification. Claude 3.5 Sonnet was further used to identify complaint themes. LLM classifications were assessed for accuracy and consistency with human coding using accuracy and F1-score. Cohen κ and McNemar test evaluated AI-human agreement and compared AI models' concordance, respectively.

Results: The majority of complaints fell under the HCAT (GP) domain of management (1079/1816, 59.4%), specifically relating to institutional processes (830/1816, 45.7%). Most complaints were of medium severity (994/1816, 54.7%), occurred within the practice (627/1816, 34.5%), and resulted in minimal harm (75.4%). LLMs achieved moderate to good accuracy (58.4%-95.5%) in HCAT (GP) field classifications, with GPT-4o mini generally outperforming GPT-3.5 turbo, except in severity classification. All 3 LLMs demonstrated moderate concordance rates (average 61.9%-68.8%) in complaints classification with varying levels of agreement (κ=0.114-0.623). GPT-4o mini and Claude 3.5 significantly outperformed GPT-3.5 turbo in several fields (P<.05), such as domain and stage of care classification. Thematic analysis using Claude 3.5 identified long wait times (393/1816, 21.6%), staff attitudes (287/1816, 15.8%), and appointment booking issues (191/1816, 10.5%) as the top concerns, which accounted for nearly half of all complaints.

Conclusions: Our study highlighted the potential of LLMs in classifying patient complaints in primary care using HCAT (GP) taxonomy. While GPT-4o and Claude 3.5 demonstrated promising results, further fine-tuning and model training are required to improve accuracy. Integrating AI into complaint analysis can facilitate proactive identification of systemic issues, ultimately enhancing quality improvement and patient safety. By leveraging LLMs, health care organizations can prioritize complaints and escalate high-risk issues more effectively. Theoretically, this could lead to improved patient care and experience; further research is needed to confirm this potential benefit.

Keywords: artificial intelligence; family medicine; health services; large language models; patient complaints; primary care.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Similar articles

References

    1. Clwyd A, Hart T. UK National Health Service; 2013. [25-07-2025]. A review of the NHS hospitals complaints system putting patients back in the picture: final report.https://assets.publishing.service.gov.uk/media/5a7cb9eb40f0b65b3de0aca7/... URL. Accessed.
    1. Boylan AM, Turk A, van Velthoven MH, Powell J. Online patient feedback as a measure of quality in primary care: a multimethod study using correlation and qualitative analysis. BMJ Open. 2020 Feb 28;10(2):e031820. doi: 10.1136/bmjopen-2019-031820. doi. Medline. - DOI - PMC - PubMed
    1. Chan B, Cochrane D, Canadian Institute for Health Information, Canadian Patient Safety Institute . Canadian Institute for Health Information; 2016. [25-07-2025]. Measuring patient harm in Canadian hospitals.https://tinyurl.com/mryeb5wk URL. Accessed.
    1. Weingart SN, Pagovich O, Sands DZ, et al. What can hospitalized patients tell us about adverse events? Learning from patient-reported incidents. J Gen Intern Med. 2005 Sep;20(9):830–836. doi: 10.1111/j.1525-1497.2005.0180.x. doi. Medline. - DOI - PMC - PubMed
    1. Reader TW, Gillespie A, Roberts J. Patient complaints in healthcare systems: a systematic review and coding taxonomy. BMJ Qual Saf. 2014 Aug;23(8):678–689. doi: 10.1136/bmjqs-2013-002437. doi. Medline. - DOI - PMC - PubMed

LinkOut - more resources