Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Reporting guidelines for chatbot health advice studies: explanation and elaboration for the Chatbot Assessment Reporting Tool (CHART)

CHART Collaborative. BMJ. .

Abstract

The Chatbot Assessment Reporting Tool (CHART) reporting guideline promotes transparent and comprehensive reporting of studies evaluating the performance of generative artificial intelligence (AI)-driven chatbots for the purposes of summarising clinical evidence and providing health advice, referred to here as chatbot health advice (CHA) studies. CHART is the product of an international, multi-phase, consensus based initiative involving various stakeholders and comprises a 12-item checklist with 39 subitems. The checklist includes items on open science, title and abstract, introduction, model identification, model details, prompt engineering, query strategy, performance definition and evaluation, statistical analysis, results, discussion, with an accompanying flow diagram. Each item includes distinct subitems. This explanation and elaboration article discusses each subitem and provides a detailed rationale for its inclusion in the CHART checklist.

PubMed Disclaimer

Conflict of interest statement

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: support from the First Cut competition and the postgraduate medical education committee at McMaster University for the submitted work. GSC is a National Institute for Health and Care Research (NIHR) Senior Investigator. The views expressed in this article are those of the author(s) and not necessarily those of the NIHR, or the Department of Health and Social Care; AJT has received funding from HealthSense to investigate evidence based medicine applications of large language models. PM is the co-founder of BrainX; AS has received research funding from the Australian government and is co-founder of BantingMed Pty; DS is the acting deputy editor for The Lancet Digital Health; MM has received research funding from the Hospital Research Founding Group; TF sits on the executive committee of MDEpiNet; HF is a senior executive editor for The Lancet; CL is editor-in-chief of Annals of Internal Medicine; AF is executive managing editor and vice president, editorial operations, at JAMA and the JAMA Network; TF and EL are journal editors for The BMJ; RA is the editor-in-chief of the International Journal of Surgery; GS is an executive editor of Artificial Intelligence in Medicine; SL is a paid consultant for Astellas; DP has received research funding from the Italian Ministry of University and Research; MO is a paid consultant for Theator; TA, POV, and GG are board member of the MAGIC Evidence Ecosystem Foundation (www.magicproject.org), a non-for profit organisation that conducts research and evidence appraisal and guideline methodology and implementation, and provides authoring and publication software (MAGICapp) for evidence summaries, guidelines, and decision aids.

Figures

Fig 1
Fig 1
CHART methodological diagram. AI=artificial intelligence; API=application programming interfaces; CHART=Chatbot Assessment Reporting Tool

References

    1. Noy S, Zhang W. Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence. https://www.science.org - PubMed
    1. Shah NH, Entwistle D, Pfeffer MA. Creation and Adoption of Large Language Models in Medicine. JAMA 2023;330:866-9. 10.1001/jama.2023.14217. - DOI - PubMed
    1. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med 2023;29:1930-40. 10.1038/s41591-023-02448-8. - DOI - PubMed
    1. Huo B, Boyle A, Marfo N, et al. Large Language Models for Chatbot Health Advice Studies: A Systematic Review. JAMA Netw Open 2025;8:e2457879. 10.1001/jamanetworkopen.2024.57879. - DOI - PMC - PubMed
    1. Elstein AS, Schwarz A. Evidence Base Of Clinical Diagnosis: Clinical Problem Solving And Diagnostic Decision Making: Selective Review Of The Cognitive Literature. Vol 324; 2002. https://about.jstor.org/terms - PMC - PubMed

LinkOut - more resources