. 2024 Mar 29;97(1):17-27.

doi: 10.59249/ZTOZ1966. eCollection 2024 Mar.

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study

Kanhai S Amin¹, Linda C Mayes², Pavan Khosla³, Rushabh H Doshi³

Affiliations

¹ Yale College, New Haven, CT, USA.
² Yale Child Study Center, Yale School of Medicine, New Haven, CT, USA.
³ Yale School of Medicine, New Haven, CT, USA.

PMID: 38559461
PMCID: PMC10964816
DOI: 10.59249/ZTOZ1966

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study

Kanhai S Amin et al. Yale J Biol Med. 2024.

. 2024 Mar 29;97(1):17-27.

doi: 10.59249/ZTOZ1966. eCollection 2024 Mar.

Authors

Kanhai S Amin¹, Linda C Mayes², Pavan Khosla³, Rushabh H Doshi³

Affiliations

¹ Yale College, New Haven, CT, USA.
² Yale Child Study Center, Yale School of Medicine, New Haven, CT, USA.
³ Yale School of Medicine, New Haven, CT, USA.

PMID: 38559461
PMCID: PMC10964816
DOI: 10.59249/ZTOZ1966

Abstract

Enhanced health literacy in children has been empirically linked to better health outcomes over the long term; however, few interventions have been shown to improve health literacy. In this context, we investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children. We tested pediatric conditions using 26 different prompts in ChatGPT-3.5, ChatGPT-4, Microsoft Bing, and Google Bard (now known as Google Gemini). The primary outcome measurement was the reading grade level (RGL) of output as assessed by Gunning Fog, Flesch-Kincaid Grade Level, Automated Readability Index, and Coleman-Liau indices. Word counts were also assessed. Across all models, output for basic prompts such as "Explain" and "What is (are)," were at, or exceeded, the tenth-grade RGL. When prompts were specified to explain conditions from the first- to twelfth-grade level, we found that LLMs had varying abilities to tailor responses based on grade level. ChatGPT-3.5 provided responses that ranged from the seventh-grade to college freshmen RGL while ChatGPT-4 outputted responses from the tenth-grade to the college senior RGL. Microsoft Bing provided responses from the ninth- to eleventh-grade RGL while Google Bard provided responses from the seventh- to tenth-grade RGL. LLMs face challenges in crafting outputs below a sixth-grade RGL. However, their capability to modify outputs above this threshold, provides a potential mechanism for adolescents to explore, understand, and engage with information regarding their health conditions, spanning from simple to complex terms. Future studies are needed to verify the accuracy and efficacy of these tools.

Keywords: Artificial Intelligence; ChatGPT; Google Bard; Google Gemini; Health Literacy; Large Language Models; Microsoft Bing; Pediatrics; Reading Grade Level.

PubMed Disclaimer

Figures

**Figure 1**
**Reading grade levels for basic prompts**. Legend: Basic Prompts P0 “What is (are) {medical condition}” and P1 “Explain {medical condition}” were tested through the LLMs. The aRGL of outputs are shown. *, **, ***, **** correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively. Comparisons between LLM for identical prompts are not shown, but all differences are statistically significant p<0.0001, except between ChatGPT-3.5 and ChatGPT-4 for “what is” / “what are.”

**Figure 2**
**Reading grade level of output after running “Explain {} to a ____ grader” through each LLM**. Legend: Each LLM was asked, “Explain {medical condition} to a __ grader.” First- through twelfth-grade were tested by filling in the blank. A. The aRGL of outputs is depicted for each LLM. From top to bottom, GPT-3.5, GPT-4, Bing, and Bard are depicted. B. Grade-level outputs for each LLM from panel A are set side to side for comparison between LLMs.

**Figure 3**
**Reading grade level of output after running “Explain {} at a ____- grade reading level” through each LLM**. Legend: Each LLM was asked “Explain {medical condition} at a __-grade reading level.” First- through twelfth-grade were tested by filling in the blank. A. The aRGL of outputs is depicted for each LLM. From top to bottom, GPT-3.5, GPT-4, Bing, and Bard are depicted. B. Grade-level outputs for each LLM from panel A are set side to side for comparison between LLMs.

See this image and copyright information in PMC

Cited by

Comparative performance analysis of large language models: ChatGPT-3.5, ChatGPT-4 and Google Gemini in glucocorticoid-induced osteoporosis.
Tong L, Zhang C, Liu R, Yang J, Sun Z. Tong L, et al. J Orthop Surg Res. 2024 Sep 18;19(1):574. doi: 10.1186/s13018-024-04996-2. J Orthop Surg Res. 2024. PMID: 39289734 Free PMC article.
ChatGPT 4.0's efficacy in the self-diagnosis of non-traumatic hand conditions.
Unadkat KD, Abdulwadood I, Hiredesai AN, Howlett CP, Geldmaker LE, Noland SS. Unadkat KD, et al. J Hand Microsurg. 2025 Jan 23;17(3):100217. doi: 10.1016/j.jham.2025.100217. eCollection 2025 May. J Hand Microsurg. 2025. PMID: 40007763
From Data to Decisions: Leveraging Retrieval-Augmented Generation to Balance Citation Bias in Burn Management Literature.
Genovese A, Prabha S, Borna S, Gomez-Cabello CA, Haider SA, Trabilsy M, Tao C, Forte AJ. Genovese A, et al. Eur Burn J. 2025 Jun 2;6(2):28. doi: 10.3390/ebj6020028. Eur Burn J. 2025. PMID: 40558623 Free PMC article.
Development, optimization, and preliminary evaluation of a novel artificial intelligence tool to promote patient health literacy in radiology reports: The Rads-Lit tool.
Doshi RH, Amin K, Chan SM, Kaur M, Bajaj SS, Khosla P, Kothari VT, Mozayan A, Tocino I, Chheang S. Doshi RH, et al. PLoS One. 2025 Sep 3;20(9):e0331368. doi: 10.1371/journal.pone.0331368. eCollection 2025. PLoS One. 2025. PMID: 40901830 Free PMC article.
Unlocking Health Literacy: The Ultimate Guide to Hypertension Education From ChatGPT Versus Google Gemini.
Lee TJ, Campbell DJ, Patel S, Hossain A, Radfar N, Siddiqui E, Gardin JM. Lee TJ, et al. Cureus. 2024 May 8;16(5):e59898. doi: 10.7759/cureus.59898. eCollection 2024 May. Cureus. 2024. PMID: 38721479 Free PMC article.

See all "Cited by" articles

References

1. Morrison AK, Glick A, Yin HS. Health literacy: implications for child health. Pediatr Rev. 2019. Jun;40(6):263–77. 10.1542/pir.2018-0027 - DOI - PubMed
1. Berkman ND, Sheridan SL, Donahue KE, Halpern DJ, Viera A, Crotty K, et al. Health literacy interventions and outcomes: an updated systematic review. Evid Rep Technol Assess (Full Rep). 2011. Mar;(199):1–941. - PMC - PubMed
1. Vongxay V, Albers F, Thongmixay S, Thongsombath M, Broerse JE, Sychareun V, et al. Sexual and reproductive health literacy of school adolescents in Lao PDR. PLoS One. 2019. Jan;14(1):e0209675. 10.1371/journal.pone.0209675 - DOI - PMC - PubMed
1. Lam LT. Mental health literacy and mental health status in adolescents: a population-based survey. Child Adolesc Psychiatry Ment Health. 2014;8(1):1–8. 10.1186/1753-2000-8-26 - DOI - PubMed
1. Baker DW, Parker RM, Williams MV, Clark WS, Nurss J. The relationship of patient reading ability to self-reported health and use of health services. Am J Public Health. 1997. Jun;87(6):1027–30. 10.2105/AJPH.87.6.1027 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study

Affiliations

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Medical