Reporting guideline for Chatbot Health Advice studies: the CHART statement
- PMID: 40745595
- PMCID: PMC12315282
- DOI: 10.1186/s12916-025-04274-w
Reporting guideline for Chatbot Health Advice studies: the CHART statement
Abstract
Background: The Chatbot Assessment Reporting Tool (CHART) is a reporting guideline developed to provide reporting recommendations for studies evaluating the performance of generative artificial intelligence (AI)-driven chatbots when summarizing clinical evidence and providing health advice, referred to as Chatbot Health Advice (CHA) studies.
Methods: CHART was developed in several phases after performing a comprehensive systematic review to identify variation in the conduct, reporting, and methodology in CHA studies. Findings from the review were used to develop a draft checklist that was revised through an international, multidisciplinary modified asynchronous Delphi consensus process of 531 stakeholders, three synchronous panel consensus meetings of 48 stakeholders, and subsequent pilot testing of the checklist.
Results: CHART includes 12 items and 39 subitems to promote transparent and comprehensive reporting of CHA studies. These include Title (subitem 1a), Abstract/Summary (subitem 1b), Background (subitems 2ab), Model Identifiers (subitems 3ab), Model Details (subitems 4abc), Prompt Engineering (subitems 5ab), Query Strategy (subitems 6abcd), Performance Evaluation (subitems 7ab), Sample Size (subitem 8), Data Analysis (subitem 9a), Results (subitems 10abc), Discussion (subitems 11abc), Disclosures (subitem 12a), Funding (subitem 12b), Ethics (subitem 12c), Protocol (subitem 12d), and Data Availability (subitem 12e).
Conclusion: The CHART checklist and corresponding methodological diagram were designed to support key stakeholders including clinicians, researchers, editors, peer reviewers, and readers in reporting, understanding, and interpreting the findings of CHA studies.
Keywords: Generative AI; LLMs; Reporting standards.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethics approval and consent to participate: Ethics approval was submitted to and waived by the Hamilton Integrated Research Ethics Board (HiREB #17025). Consent for publication: Not applicable. Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: GSC is a National Institute for Health and Care Research (NIHR) Senior Investigator. The views expressed in this article are those of the author(s) and not necessarily those of the NIHR, or the Department of Health and Social Care; AJT has received funding from HealthSense to investigate evidence-based medicine applications of large language models. PM is the co-founder of BrainX LLC; AS has received research funding from the Australian government and is co-founder of BantingMed Pty Ltd; DS is the Acting Deputy Editor for the Lancet Digital Health; MM has received research funding from The Hospital Research Founding Group; TF sits on the executive committee of MDEpiNet; HF is a Senior Executive Editor for The Lancet; CL is the Editor in Chief of Annals of Internal Medicine; AF is Executive Managing Editor and Vice President, Editorial Operations, JAMA and The JAMA Network; TF and EL are journal editors for the BMJ; RA is the Editor in Chief of International Journal of Surgery; GS is an Executive Editor of Artificial Intelligence in Medicine; SL is a paid consultant for Astellas; DP has received research funding from the Italian Ministry of University and Research; MO is a paid consultant for Theator; TA, POV, GG are board member of the MAGIC Evidence Ecosystem Foundation ( www.magicproject.org ), a non-for profit organization, which conducts research and evidence appraisal and guideline methodology and implementation, and which provides a authoring and publication software (MAGICapp) for evidence summaries, guidelines and decision aids.
References
-
- Huo B, Cacciamani GE, Collins GS, McKechnie T, Lee Y, Guyatt G. Reporting standards for the use of large language model-linked chatbots for health advice. Nat Med. 2023;29:2988. - PubMed
-
- Huo B, McKechnie T, Ortenzi M, Lee Y, Antoniou S, Mayol J, et al. Dr. GPT will see you now: the ability of large language model-linked chatbots to provide colorectal cancer screening recommendations. Health Technol. 2024;14:463–9.
-
- Huo B, Marfo N, Sylla P, Calabrese E, Kumar S, Slater BJ, et al. Clinical artificial intelligence: teaching a large language model to generate recommendations that align with guidelines for the surgical management of GERD. Surg Endosc. 2024;38:5668–77. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources