Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 29;97(1):17-27.
doi: 10.59249/ZTOZ1966. eCollection 2024 Mar.

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study

Affiliations

Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study

Kanhai S Amin et al. Yale J Biol Med. .

Abstract

Enhanced health literacy in children has been empirically linked to better health outcomes over the long term; however, few interventions have been shown to improve health literacy. In this context, we investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children. We tested pediatric conditions using 26 different prompts in ChatGPT-3.5, ChatGPT-4, Microsoft Bing, and Google Bard (now known as Google Gemini). The primary outcome measurement was the reading grade level (RGL) of output as assessed by Gunning Fog, Flesch-Kincaid Grade Level, Automated Readability Index, and Coleman-Liau indices. Word counts were also assessed. Across all models, output for basic prompts such as "Explain" and "What is (are)," were at, or exceeded, the tenth-grade RGL. When prompts were specified to explain conditions from the first- to twelfth-grade level, we found that LLMs had varying abilities to tailor responses based on grade level. ChatGPT-3.5 provided responses that ranged from the seventh-grade to college freshmen RGL while ChatGPT-4 outputted responses from the tenth-grade to the college senior RGL. Microsoft Bing provided responses from the ninth- to eleventh-grade RGL while Google Bard provided responses from the seventh- to tenth-grade RGL. LLMs face challenges in crafting outputs below a sixth-grade RGL. However, their capability to modify outputs above this threshold, provides a potential mechanism for adolescents to explore, understand, and engage with information regarding their health conditions, spanning from simple to complex terms. Future studies are needed to verify the accuracy and efficacy of these tools.

Keywords: Artificial Intelligence; ChatGPT; Google Bard; Google Gemini; Health Literacy; Large Language Models; Microsoft Bing; Pediatrics; Reading Grade Level.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Reading grade levels for basic prompts. Legend: Basic Prompts P0 “What is (are) {medical condition}” and P1 “Explain {medical condition}” were tested through the LLMs. The aRGL of outputs are shown. *, **, ***, **** correspond to p<0.05, p<0.01, p<0.001, and p<0.0001, respectively. Comparisons between LLM for identical prompts are not shown, but all differences are statistically significant p<0.0001, except between ChatGPT-3.5 and ChatGPT-4 for “what is” / “what are.”
Figure 2
Figure 2
Reading grade level of output after running “Explain {} to a ____ grader” through each LLM. Legend: Each LLM was asked, “Explain {medical condition} to a __ grader.” First- through twelfth-grade were tested by filling in the blank. A. The aRGL of outputs is depicted for each LLM. From top to bottom, GPT-3.5, GPT-4, Bing, and Bard are depicted. B. Grade-level outputs for each LLM from panel A are set side to side for comparison between LLMs.
Figure 3
Figure 3
Reading grade level of output after running “Explain {} at a ____- grade reading level” through each LLM. Legend: Each LLM was asked “Explain {medical condition} at a __-grade reading level.” First- through twelfth-grade were tested by filling in the blank. A. The aRGL of outputs is depicted for each LLM. From top to bottom, GPT-3.5, GPT-4, Bing, and Bard are depicted. B. Grade-level outputs for each LLM from panel A are set side to side for comparison between LLMs.

Similar articles

Cited by

References

    1. Morrison AK, Glick A, Yin HS. Health literacy: implications for child health. Pediatr Rev. 2019. Jun;40(6):263–77. 10.1542/pir.2018-0027 - DOI - PubMed
    1. Berkman ND, Sheridan SL, Donahue KE, Halpern DJ, Viera A, Crotty K, et al. Health literacy interventions and outcomes: an updated systematic review. Evid Rep Technol Assess (Full Rep). 2011. Mar;(199):1–941. - PMC - PubMed
    1. Vongxay V, Albers F, Thongmixay S, Thongsombath M, Broerse JE, Sychareun V, et al. Sexual and reproductive health literacy of school adolescents in Lao PDR. PLoS One. 2019. Jan;14(1):e0209675. 10.1371/journal.pone.0209675 - DOI - PMC - PubMed
    1. Lam LT. Mental health literacy and mental health status in adolescents: a population-based survey. Child Adolesc Psychiatry Ment Health. 2014;8(1):1–8. 10.1186/1753-2000-8-26 - DOI - PubMed
    1. Baker DW, Parker RM, Williams MV, Clark WS, Nurss J. The relationship of patient reading ability to self-reported health and use of health services. Am J Public Health. 1997. Jun;87(6):1027–30. 10.2105/AJPH.87.6.1027 - DOI - PMC - PubMed

LinkOut - more resources