Mixed methods assessment of the influence of demographics on medical advice of ChatGPT
- PMID: 38679900
- PMCID: PMC11339520
- DOI: 10.1093/jamia/ocae086
Mixed methods assessment of the influence of demographics on medical advice of ChatGPT
Abstract
Objectives: To evaluate demographic biases in diagnostic accuracy and health advice between generative artificial intelligence (AI) (ChatGPT GPT-4) and traditional symptom checkers like WebMD.
Materials and methods: Combination symptom and demographic vignettes were developed for 27 most common symptom complaints. Standardized prompts, written from a patient perspective, with varying demographic permutations of age, sex, and race/ethnicity were entered into ChatGPT (GPT-4) between July and August 2023. In total, 3 runs of 540 ChatGPT prompts were compared to the corresponding WebMD Symptom Checker output using a mixed-methods approach. In addition to diagnostic correctness, the associated text generated by ChatGPT was analyzed for readability (using Flesch-Kincaid Grade Level) and qualitative aspects like disclaimers and demographic tailoring.
Results: ChatGPT matched WebMD in 91% of diagnoses, with a 24% top diagnosis match rate. Diagnostic accuracy was not significantly different across demographic groups, including age, race/ethnicity, and sex. ChatGPT's urgent care recommendations and demographic tailoring were presented significantly more to 75-year-olds versus 25-year-olds (P < .01) but were not statistically different among race/ethnicity and sex groups. The GPT text was suitable for college students, with no significant demographic variability.
Discussion: The use of non-health-tailored generative AI, like ChatGPT, for simple symptom-checking functions provides comparable diagnostic accuracy to commercially available symptom checkers and does not demonstrate significant demographic bias in this setting. The text accompanying differential diagnoses, however, suggests demographic tailoring that could potentially introduce bias.
Conclusion: These results highlight the need for continued rigorous evaluation of AI-driven medical platforms, focusing on demographic biases to ensure equitable care.
Keywords: ChatGPT; artificial intelligence; bias; digital health; large language model; symptom checker.
© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Conflict of interest statement
The authors have no competing interests to declare.
Figures
Similar articles
-
Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study.JMIR Mhealth Uhealth. 2023 Oct 3;11:e49995. doi: 10.2196/49995. JMIR Mhealth Uhealth. 2023. PMID: 37788063 Free PMC article.
-
Generative artificial intelligence versus clinicians: Who diagnoses multiple sclerosis faster and with greater accuracy?Mult Scler Relat Disord. 2024 Oct;90:105791. doi: 10.1016/j.msard.2024.105791. Epub 2024 Aug 6. Mult Scler Relat Disord. 2024. PMID: 39146892
-
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939. J Med Internet Res. 2024. PMID: 39141904 Free PMC article.
-
Optimizing ChatGPT's Interpretation and Reporting of Delirium Assessment Outcomes: Exploratory Study.JMIR Form Res. 2024 Oct 1;8:e51383. doi: 10.2196/51383. JMIR Form Res. 2024. PMID: 39353189 Free PMC article.
-
Application of artificial intelligence chatbots, including ChatGPT, in education, scholarly work, programming, and content generation and its prospects: a narrative review.J Educ Eval Health Prof. 2023;20:38. doi: 10.3352/jeehp.2023.20.38. Epub 2023 Dec 27. J Educ Eval Health Prof. 2023. PMID: 38148495 Free PMC article. Review.
Cited by
-
Public Versus Academic Discourse on ChatGPT in Health Care: Mixed Methods Study.JMIR Infodemiology. 2025 Jun 23;5:e64509. doi: 10.2196/64509. JMIR Infodemiology. 2025. PMID: 40550010 Free PMC article.
-
Large language models in biomedicine and health: current research landscape and future directions.J Am Med Inform Assoc. 2024 Sep 1;31(9):1801-1811. doi: 10.1093/jamia/ocae202. J Am Med Inform Assoc. 2024. PMID: 39169867 Free PMC article. No abstract available.
-
The Goldilocks Zone: Finding the right balance of user and institutional risk for suicide-related generative AI queries.PLOS Digit Health. 2025 Jan 8;4(1):e0000711. doi: 10.1371/journal.pdig.0000711. eCollection 2025 Jan. PLOS Digit Health. 2025. PMID: 39774367 Free PMC article.
-
Fairness in AI-Driven Oncology: Investigating Racial and Gender Biases in Large Language Models.Cureus. 2024 Sep 16;16(9):e69541. doi: 10.7759/cureus.69541. eCollection 2024 Sep. Cureus. 2024. PMID: 39416584 Free PMC article.
-
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062. J Med Internet Res. 2025. PMID: 40489764 Free PMC article.
References
-
- Wyatt JC. Fifty million people use computerised self triage. BMJ. 2015;351:h3727. - PubMed
-
- Arora VM, Madison S, Simpson L. Addressing medical misinformation in the patient-clinician relationship. JAMA. 2020;324(23):2367-2368. - PubMed
-
- Pew Research. The search for online medical help. Accessed December 13, 2023. https://www.pewresearch.org/internet/2002/05/22/main-report-the-search-f...
MeSH terms
LinkOut - more resources
Full Text Sources