Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Jan;31(1):60-69.
doi: 10.1038/s41591-024-03425-5. Epub 2025 Jan 8.

The TRIPOD-LLM reporting guideline for studies using large language models

Affiliations
Review

The TRIPOD-LLM reporting guideline for studies using large language models

Jack Gallifant et al. Nat Med. 2025 Jan.

Abstract

Large language models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present transparent reporting of a multivariable model for individual prognosis or diagnosis (TRIPOD)-LLM, an extension of the TRIPOD + artificial intelligence statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight and task-specific performance reporting. We also introduce an interactive website ( https://tripod-llm.vercel.app/ ) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility and clinical applicability of LLM research in healthcare through comprehensive reporting.

PubMed Disclaimer

Conflict of interest statement

Competing interests: D.S.B. is an associate editor at Radiation Oncology and HemOnc.org, receives research funding from the American Association for Cancer Research, and provides advisory and consulting services for MercurialAI. D.D.F. is an associate editor at the Journal of the American Medical Informatics Association, is a member of the editorial board of Scientific Data, and receives funding from the intramural research program at the US National Library of Medicine, NIH. J.W.G. is a member of the editorial board of Radiology: Artificial Intelligence, BJR Artificial Intelligence and NEJM AI. All other authors declare no competing interests.

Figures

Figure 1.
Figure 1.. TRIPOD-LLM workflow.
The TRIPOD-LLM checklist workflow starts with 59 reporting items and the number of required items is reduced based on the selection of research tasks (e.g., classification, summarization) and research design (e.g., LLM evaluation). After selecting both, a filtered list is generated for reporting.

Update of

Similar articles

Cited by

References

    1. Chen Z et al. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models. Preprint at 10.48550/arXiv.2311.16079 (2023). - DOI
    1. OpenAI. GPT-4 Technical Report. Preprint at 10.48550/arXiv.2303.08774 (2023). - DOI
    1. Singhal K et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023). - PMC - PubMed
    1. Tai-Seale M et al. AI-Generated Draft Replies Integrated Into Health Records and Physicians’ Electronic Communication. JAMA Netw. Open 7, e246565 (2024). - PMC - PubMed
    1. Tierney AA et al. Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation. NEJM Catal. 5, CAT.23.0404 (2024).

LinkOut - more resources