Evaluating gender bias in large language models in long-term care
- PMID: 40784946
- PMCID: PMC12337462
- DOI: 10.1186/s12911-025-03118-0
Evaluating gender bias in large language models in long-term care
Abstract
Background: Large language models (LLMs) are being used to reduce the administrative burden in long-term care by automatically generating and summarising case notes. However, LLMs can reproduce bias in their training data. This study evaluates gender bias in summaries of long-term care records generated with two state-of-the-art, open-source LLMs released in 2024: Meta's Llama 3 and Google Gemma.
Methods: Gender-swapped versions were created of long-term care records for 617 older people from a London local authority. Summaries of male and female versions were generated with Llama 3 and Gemma, as well as benchmark models from Meta and Google released in 2019: T5 and BART. Counterfactual bias was quantified through sentiment analysis alongside an evaluation of word frequency and thematic patterns.
Results: The benchmark models exhibited some variation in output on the basis of gender. Llama 3 showed no gender-based differences across any metrics. Gemma displayed the most significant gender-based differences. Male summaries focus more on physical and mental health issues. Language used for men was more direct, with women's needs downplayed more often than men's.
Conclusion: Care services are allocated on the basis of need. If women's health issues are underemphasised, this may lead to gender-based disparities in service receipt. LLMs may offer substantial benefits in easing administrative burden. However, the findings highlight the variation in state-of-the-art LLMs, and the need for evaluation of bias. The methods in this paper provide a practical framework for quantitative evaluation of gender bias in LLMs. The code is available on GitHub.
Keywords: Bias; Gender; LLMs; Long-term care.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethics approval and consent to participate: This study uses secondary data from administrative records, which were pseudonymised prior to egress to remove identifiable personal information (e.g., names, addresses, NHS numbers, and other unique identifiers). According to the UK General Data Protection Regulation (GDPR), processing of these data was conducted under the legal basis of legitimate interests, which does not require individual opt-in consent. This study was conducted in accordance with the principles of the Declaration of Helsinki. It involved the use of secondary data only, with no direct contact with participants. The data were pseudonymised prior to access and processed in line with established ethical standards for research using routinely collected health and social care records. Individual informed consent was not required, as the project involved no automated decision-making and used pseudonymised data throughout. Ethics approval for the project was granted by the LSE Personal Social Services Research Unit’s ethics committee on 30th May 2019, in compliance with the LSE’s Research Ethics Policy. A Data Processing Impact Assessment was completed, and the details of the project were made publicly available via a Privacy Notice on the local authority’s website, with local opt-out options provided. Approval was also granted by the NHS Confidentiality Advisory Group (CAG) in June 2020 (reference number 20/CAG/0043), with annual renewal. Consent for publication: Not applicable. This study does not include any individual-level identifying images, names, addresses, locations, or other information that could compromise participant anonymity. All data used in the study were pseudonymised prior to access, and no direct contact with participants occurred. Competing interests: The author declares no competing interests.
References
-
- Local Government Assocation. Artificial intelligence use cases, 2024. https://web.archive.org/web/20240904192138/https://www.local.gov.uk/our-.... [Accessed: 2024-09-04].
-
- Local Government: State of the Sector: AI Research Report. Technical report, local government association. 2024. https://web.archive.org/web/20240906174435/https://www.local.gov.uk/site.... [Accessed: 2024-09-06].
-
- Google Cloud. MedLM: Generative AI fine-tuned for the healthcare industry, 2024. https://web.archive.org/web/20240804062023/https://cloud.google.com/blog.... [Accessed: 2024-09-01].
-
- Lillis T, Leedham M, Twiner A. Time, the written record, and professional practice: the case of contemporary social work. Writ Commun. 2020;37:431–86. 10.1177/0741088320938804.
MeSH terms
LinkOut - more resources
Full Text Sources
