Review

. 2025 Jan;31(1):60-69.

doi: 10.1038/s41591-024-03425-5. Epub 2025 Jan 8.

The TRIPOD-LLM reporting guideline for studies using large language models

Jack Gallifant^{1

2

3}, Majid Afshar^#⁴, Saleem Ameen^#^{1

5

6}, Yindalon Aphinyanaphongs^#⁷, Shan Chen^#^{3

8}, Giovanni Cacciamani^#^{9

10}, Dina Demner-Fushman^#¹¹, Dmitriy Dligach^#¹², Roxana Daneshjou^#^{13

14}, Chrystinne Fernandes^#¹, Lasse Hyldig Hansen^#^{1

15}, Adam Landman^#¹⁶, Lisa Lehmann^#¹⁶, Liam G McCoy^#¹⁷, Timothy Miller^#¹⁸, Amy Moreno^#¹⁹, Nikolaj Munch^#^{1

15}, David Restrepo^#^{1

20}, Guergana Savova^#¹⁸, Renato Umeton^#²¹, Judy Wawira Gichoya^#²², Gary S Collins^{23

24}, Karel G M Moons^{25

26}, Leo A Celi^{1

27

28}, Danielle S Bitterman^{29

30}

Affiliations

¹ Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA.
² Department of Critical Care, Guy's and St Thomas' NHS Foundation Trust, London, UK.
³ Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA.
⁴ Department of Medicine, University of Wisconsin-Madison, Madison, WI, USA.
⁵ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
⁶ Tasmanian School of Medicine, College of Health and Medicine, University of Tasmania, Hobart, Tasmania, Australia.
⁷ Department of Population Health, NYU Grossman School of Medicine and Langone Health, New York, NY, USA.
⁸ Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA.
⁹ USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
¹⁰ Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
¹¹ National Library of Medicine, NIH, HHS, Bethesda, MD, USA.
¹² Department of Computer Science, Loyola University, Chicago, IL, USA.
¹³ Department of Dermatology, Stanford School of Medicine, Redwood City, CA, USA.
¹⁴ Department of Biomedical Data Science, Stanford School of Medicine, Redwood City, CA, USA.
¹⁵ Cognitive Science, Aarhus University, Jens Chr. Skou 2, Aarhus, Denmark.
¹⁶ Mass General Brigham, Boston, MA, USA.
¹⁷ Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta, Canada.
¹⁸ Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
¹⁹ Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
²⁰ Departamento de Telematica, Universidad del Cauca, Popayan, Colombia.
²¹ Dana-Farber Cancer Institute, Boston, MA, USA.
²² Department of Radiology, Emory University School of Medicine, Atlanta, GA, USA.
²³ Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK.
²⁴ UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK.
²⁵ Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, the Netherlands.
²⁶ Health Innovation Netherlands (HINL), Utrecht, the Netherlands.
²⁷ Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA.
²⁸ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
²⁹ Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA. dbitterman@bwh.harvard.edu.
³⁰ Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA. dbitterman@bwh.harvard.edu.

^# Contributed equally.

PMID: 39779929
PMCID: PMC12104976
DOI: 10.1038/s41591-024-03425-5

Review

The TRIPOD-LLM reporting guideline for studies using large language models

Jack Gallifant et al. Nat Med. 2025 Jan.

. 2025 Jan;31(1):60-69.

doi: 10.1038/s41591-024-03425-5. Epub 2025 Jan 8.

Authors

Affiliations

¹ Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA.
² Department of Critical Care, Guy's and St Thomas' NHS Foundation Trust, London, UK.
³ Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA.
⁴ Department of Medicine, University of Wisconsin-Madison, Madison, WI, USA.
⁵ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
⁶ Tasmanian School of Medicine, College of Health and Medicine, University of Tasmania, Hobart, Tasmania, Australia.
⁷ Department of Population Health, NYU Grossman School of Medicine and Langone Health, New York, NY, USA.
⁸ Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA.
⁹ USC Institute of Urology and Catherine and Joseph Aresty Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
¹⁰ Artificial Intelligence Center, USC Institute of Urology, University of Southern California, Los Angeles, CA, USA.
¹¹ National Library of Medicine, NIH, HHS, Bethesda, MD, USA.
¹² Department of Computer Science, Loyola University, Chicago, IL, USA.
¹³ Department of Dermatology, Stanford School of Medicine, Redwood City, CA, USA.
¹⁴ Department of Biomedical Data Science, Stanford School of Medicine, Redwood City, CA, USA.
¹⁵ Cognitive Science, Aarhus University, Jens Chr. Skou 2, Aarhus, Denmark.
¹⁶ Mass General Brigham, Boston, MA, USA.
¹⁷ Faculty of Medicine and Dentistry, University of Alberta, Edmonton, Alberta, Canada.
¹⁸ Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA.
¹⁹ Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
²⁰ Departamento de Telematica, Universidad del Cauca, Popayan, Colombia.
²¹ Dana-Farber Cancer Institute, Boston, MA, USA.
²² Department of Radiology, Emory University School of Medicine, Atlanta, GA, USA.
²³ Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK.
²⁴ UK EQUATOR Centre, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK.
²⁵ Julius Center for Health Sciences and Primary Care, UMC Utrecht, Utrecht University, Utrecht, the Netherlands.
²⁶ Health Innovation Netherlands (HINL), Utrecht, the Netherlands.
²⁷ Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA.
²⁸ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
²⁹ Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA. dbitterman@bwh.harvard.edu.
³⁰ Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA. dbitterman@bwh.harvard.edu.

^# Contributed equally.

PMID: 39779929
PMCID: PMC12104976
DOI: 10.1038/s41591-024-03425-5

Abstract

Large language models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present transparent reporting of a multivariable model for individual prognosis or diagnosis (TRIPOD)-LLM, an extension of the TRIPOD + artificial intelligence statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion. The guidelines introduce a modular format accommodating various LLM research designs and tasks, with 14 main items and 32 subitems applicable across all categories. Developed through an expedited Delphi process and expert consensus, TRIPOD-LLM emphasizes transparency, human oversight and task-specific performance reporting. We also introduce an interactive website ( https://tripod-llm.vercel.app/ ) facilitating easy guideline completion and PDF generation for submission. As a living document, TRIPOD-LLM will evolve with the field, aiming to enhance the quality, reproducibility and clinical applicability of LLM research in healthcare through comprehensive reporting.

PubMed Disclaimer

Conflict of interest statement

Competing interests: D.S.B. is an associate editor at Radiation Oncology and HemOnc.org, receives research funding from the American Association for Cancer Research, and provides advisory and consulting services for MercurialAI. D.D.F. is an associate editor at the Journal of the American Medical Informatics Association, is a member of the editorial board of Scientific Data, and receives funding from the intramural research program at the US National Library of Medicine, NIH. J.W.G. is a member of the editorial board of Radiology: Artificial Intelligence, BJR Artificial Intelligence and NEJM AI. All other authors declare no competing interests.

Figures

**Figure 1.. TRIPOD-LLM workflow.**
The TRIPOD-LLM checklist workflow starts with 59 reporting items **and t**he number of required items is reduced based on the selection of research tasks (e.g., classification, summarization) and research design (e.g., LLM evaluation). After selecting both, a filtered list is generated for reporting.

See this image and copyright information in PMC

Update of

The TRIPOD-LLM Statement: A Targeted Guideline For Reporting Large Language Models Use.
Gallifant J, Afshar M, Ameen S, Aphinyanaphongs Y, Chen S, Cacciamani G, Demner-Fushman D, Dligach D, Daneshjou R, Fernandes C, Hansen LH, Landman A, Lehmann L, McCoy LG, Miller T, Moreno A, Munch N, Restrepo D, Savova G, Umeton R, Gichoya JW, Collins GS, Moons KGM, Celi LA, Bitterman DS. Gallifant J, et al. medRxiv [Preprint]. 2024 Jul 25:2024.07.24.24310930. doi: 10.1101/2024.07.24.24310930. medRxiv. 2024. Update in: Nat Med. 2025 Jan;31(1):60-69. doi: 10.1038/s41591-024-03425-5. PMID: 39211885 Free PMC article. Updated. Preprint.

References

1. Chen Z et al. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models. Preprint at 10.48550/arXiv.2311.16079 (2023). - DOI
1. OpenAI. GPT-4 Technical Report. Preprint at 10.48550/arXiv.2303.08774 (2023). - DOI
1. Singhal K et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023). - PMC - PubMed
1. Tai-Seale M et al. AI-Generated Draft Replies Integrated Into Health Records and Physicians’ Electronic Communication. JAMA Netw. Open 7, e246565 (2024). - PMC - PubMed
1. Tierney AA et al. Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation. NEJM Catal. 5, CAT.23.0404 (2024).

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The TRIPOD-LLM reporting guideline for studies using large language models

Affiliations

The TRIPOD-LLM reporting guideline for studies using large language models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials