Development and evaluation of prompts for a large language model to screen titles and abstracts in a living systematic review
- PMID: 40701625
- PMCID: PMC12306261
- DOI: 10.1136/bmjment-2025-301762
Development and evaluation of prompts for a large language model to screen titles and abstracts in a living systematic review
Abstract
Background: Living systematic reviews (LSRs) maintain an updated summary of evidence by incorporating newly published research. While they improve review currency, repeated screening and selection of new references make them labourious and difficult to maintain. Large language models (LLMs) show promise in assisting with screening and data extraction, but more work is needed to achieve the high accuracy required for evidence that informs clinical and policy decisions.
Objective: The study evaluated the effectiveness of an LLM (GPT-4o) in title and abstract screening compared with human reviewers.
Methods: Human decisions from an LSR on prodopaminergic interventions for anhedonia served as the reference standard. The baseline search results were divided into a development and a test set. Prompts guiding the LLM's eligibility assessments were refined using the development set and evaluated on the test set and two subsequent LSR updates. Consistency of the LLM outputs was also assessed.
Results: Prompt development required 1045 records. When applied to the remaining baseline 11 939 records and two updates, the refined prompts achieved 100% sensitivity for studies ultimately included in the review after full-text screening, though sensitivity for records included by humans at the title and abstract stage varied (58-100%) across updates. Simulated workload reductions of 65-85% were observed. Prompt decisions showed high consistency, with minimal false exclusions, satisfying established screening performance benchmarks for systematic reviews.
Conclusions: Refined GPT-4o prompts demonstrated high sensitivity and moderate specificity while reducing human workload. This approach shows potential for integrating LLMs into systematic review workflows to enhance efficiency.
Keywords: Data Interpretation, Statistical; Machine Learning; PSYCHIATRY.
© Author(s) (or their employer(s)) 2025. Re-use permitted under CC BY. Published by BMJ Group.
Conflict of interest statement
Competing interests: AH, JT, JK, CF, PC, CM, AR, YK, KY, YY, ST, ĐB, EK, JP and GS: None. EGO received research and consultancy fees from Angelini Pharma. MH is a part-time employee of Get. On Institut GmbH/HelloBetter, a company that implements digital therapeutics into routine care. SL in the last three years has received honoraria for advising/consulting and/or for lectures and/or for educational material from Angelini, Apsen, Boehringer Ingelheim, Eisai, Ekademia, GedeonRichter, Janssen, Karuna, Kynexis, Lundbeck, Medichem, Medscape, Mitsubishi, Neurotorium, Otsuka, NovoNordisk, Recordati, Rovi and Teva. TT is a part-time employee of Fitting Cloud, outside of the submitted work. RS is an employee of CureApp. RS reports grants from Osake-no-Kagaku Foundation, the Mental Health Okamoto Memorial Foundation, Kobayashi Magobe Memorial Medical Foundation, personal fees from Otsuka Pharmaceutical, Nippon Shinyaku, Takeda Pharmaceutical and Sumitomo Pharma outside this work; In addition, RS has a patent JP2022049590A, US20220084673A1 pending, a patent JP2022178215A pending, a patent JP2022070086 pending and a patent JP2023074128A pending. YT reports grants from Japan Society for the Promotion of Science, Kyoto University and Pfizer Foundation, outside of the submitted work. In addition, YT is a board member of Cochrane Japan and works as a physician at Oku Medical Clinic. MS is employed in the department of neurodevelopmental disorders, Nagoya City University Graduate School of Medicine, which is an endowment department supported by the City of Nagoya and has received a personal fee from SONY outside of the submitted work. TAF reports personal fees from Boehringer-Ingelheim, Daiichi Sankyo, DT Axis, Micron, Shionogi, SONY and UpToDate, and a grant from DT Axis and Shionogi, outside of the submitted work. In addition, TAF has a patent 7448125 and a pending patent 2022-082495 and has licensed intellectual properties for Kokoro-app to DT Axis. AC has received research, educational and consultancy fees from the Italian Network for Paediatric Trials, CARIPLO Foundation, Lundbeck and Angelini Pharma, outside of the submitted work.
Figures
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources