Advancing Patient Education in Idiopathic Intracranial Hypertension: The Promise of Large Language Models
- PMID: 39399571
- PMCID: PMC11464234
- DOI: 10.1212/CPJ.0000000000200366
Advancing Patient Education in Idiopathic Intracranial Hypertension: The Promise of Large Language Models
Abstract
Background and objectives: We evaluated the performance of 3 large language models (LLMs) in generating patient education materials (PEMs) and enhancing the readability of prewritten PEMs on idiopathic intracranial hypertension (IIH).
Methods: This cross-sectional comparative study compared 3 LLMs, ChatGPT-3.5, ChatGPT-4, and Google Bard, for their ability to generate PEMs on IIH using 3 prompts. Prompt A (control prompt): "Can you write a patient-targeted health information handout on idiopathic intracranial hypertension that is easily understandable by the average American?", Prompt B (modifier statement + control prompt): "Given patient education materials are recommended to be written at a 6th-grade reading level, using the SMOG readability formula, can you write a patient-targeted health information handout on idiopathic intracranial hypertension that is easily understandable by the average American?", and Prompt C: "Given patient education materials are recommended to be written at a 6th-grade reading level, using the SMOG readability formula, can you rewrite the following text to a 6th-grade reading level: [insert text]." We compared generated and rewritten PEMs, along with the first 20 googled eligible PEMs on IIH, on readability (Simple Measure of Gobbledygook [SMOG] and Flesch-Kincaid Grade Level [FKGL]), quality (DISCERN and Patient Education Materials Assessment tool [PEMAT]), and accuracy (Likert misinformation scale).
Results: Generated PEMs were of high quality, understandability, and accuracy (median DISCERN score ≥4, PEMAT understandability ≥70%, Likert misinformation scale = 1). Only ChatGPT-4 was able to generate PEMs at the specified 6th-grade reading level (SMOG: 5.5 ± 0.6, FKGL: 5.6 ± 0.7). Original published PEMs were rewritten to below a 6th-grade reading level with Prompt C, without a decrease in quality, understandability, or accuracy only by ChatGPT-4 (SMOG: 5.6 ± 0.6, FKGL: 5.7 ± 0.8, p < 0.001, DISCERN ≥4, Likert misinformation = 1).
Discussion: In conclusion, LLMs, particularly ChatGPT-4, can produce high-quality, readable PEMs on IIH. They can also serve as supplementary tools to improve the readability of prewritten PEMs while maintaining quality and accuracy.
© 2024 American Academy of Neurology.
Conflict of interest statement
The authors report no relevant disclosures. Full disclosure form information provided by the authors is available with the full text of this article at Neurology.org/cp.
References
LinkOut - more resources
Full Text Sources