The TRIPOD-LLM reporting guideline for studies using large language models
- PMID: 39779929
- PMCID: PMC12104976
- DOI: 10.1038/s41591-024-03425-5
The TRIPOD-LLM reporting guideline for studies using large language models
Abstract
Large language models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present transparent reporting of a multivariable model for individual prognosis or diagnosis (TRIPOD)-LLM, an extension of the TRIPOD + artificial intelligence statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight and task-specific performance reporting. We also introduce an interactive website ( https://tripod-llm.vercel.app/ ) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility and clinical applicability of LLM research in healthcare through comprehensive reporting.
© 2025. The Author(s), under exclusive licence to Springer Nature America, Inc.
Conflict of interest statement
Competing interests: D.S.B. is an associate editor at Radiation Oncology and HemOnc.org, receives research funding from the American Association for Cancer Research, and provides advisory and consulting services for MercurialAI. D.D.F. is an associate editor at the Journal of the American Medical Informatics Association, is a member of the editorial board of Scientific Data, and receives funding from the intramural research program at the US National Library of Medicine, NIH. J.W.G. is a member of the editorial board of Radiology: Artificial Intelligence, BJR Artificial Intelligence and NEJM AI. All other authors declare no competing interests.
Figures

Update of
-
The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use.medRxiv [Preprint]. 2024 Jul 25:2024.07.24.24310930. doi: 10.1101/2024.07.24.24310930. medRxiv. 2024. Update in: Nat Med. 2025 Jan;31(1):60-69. doi: 10.1038/s41591-024-03425-5. PMID: 39211885 Free PMC article. Updated. Preprint.
Similar articles
-
The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use.medRxiv [Preprint]. 2024 Jul 25:2024.07.24.24310930. doi: 10.1101/2024.07.24.24310930. medRxiv. 2024. Update in: Nat Med. 2025 Jan;31(1):60-69. doi: 10.1038/s41591-024-03425-5. PMID: 39211885 Free PMC article. Updated. Preprint.
-
Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence.BMJ Open. 2021 Jul 9;11(7):e048008. doi: 10.1136/bmjopen-2020-048008. BMJ Open. 2021. PMID: 34244270 Free PMC article.
-
Adherence of Studies on Large Language Models for Medical Applications Published in Leading Medical Journals According to the MI-CLEAR-LLM Checklist.Korean J Radiol. 2025 Apr;26(4):304-312. doi: 10.3348/kjr.2024.1161. Epub 2025 Jan 23. Korean J Radiol. 2025. PMID: 40015560 Free PMC article.
-
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement.BMC Med. 2015 Jan 6;13:1. doi: 10.1186/s12916-014-0241-z. BMC Med. 2015. PMID: 25563062 Free PMC article.
-
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement.BMJ. 2015 Jan 7;350:g7594. doi: 10.1136/bmj.g7594. BMJ. 2015. PMID: 25569120 Review.
Cited by
-
Development of a GPT-4-Powered Virtual Simulated Patient and Communication Training Platform for Medical Students to Practice Discussing Abnormal Mammogram Results With Patients: Multiphase Study.JMIR Form Res. 2025 Apr 17;9:e65670. doi: 10.2196/65670. JMIR Form Res. 2025. PMID: 40246299 Free PMC article.
-
Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases.medRxiv [Preprint]. 2025 Feb 28:2025.02.26.25322769. doi: 10.1101/2025.02.26.25322769. medRxiv. 2025. PMID: 40061308 Free PMC article. Preprint.
-
Understanding Large Language Models in Healthcare: A Guide to Clinical Implementation and Interpreting Publications.Cureus. 2025 Apr 16;17(4):e82397. doi: 10.7759/cureus.82397. eCollection 2025 Apr. Cureus. 2025. PMID: 40385858 Free PMC article. Review.
-
Primer on large language models: an educational overview for intensivists.Crit Care. 2025 Jun 12;29(1):238. doi: 10.1186/s13054-025-05479-4. Crit Care. 2025. PMID: 40506762 Free PMC article. Review.
-
Operationalization of Artificial Intelligence Applications in the Intensive Care Unit: A Systematic Review.JAMA Netw Open. 2025 Jul 1;8(7):e2522866. doi: 10.1001/jamanetworkopen.2025.22866. JAMA Netw Open. 2025. PMID: 40699572 Free PMC article.
References
-
- Chen Z et al. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models. Preprint at 10.48550/arXiv.2311.16079 (2023). - DOI
-
- OpenAI. GPT-4 Technical Report. Preprint at 10.48550/arXiv.2303.08774 (2023). - DOI
-
- Tierney AA et al. Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation. NEJM Catal. 5, CAT.23.0404 (2024).
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials