Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 22;28(1):e301762.
doi: 10.1136/bmjment-2025-301762.

Development and evaluation of prompts for a large language model to screen titles and abstracts in a living systematic review

Affiliations

Development and evaluation of prompts for a large language model to screen titles and abstracts in a living systematic review

Ava Homiar et al. BMJ Ment Health. .

Abstract

Background: Living systematic reviews (LSRs) maintain an updated summary of evidence by incorporating newly published research. While they improve review currency, repeated screening and selection of new references make them labourious and difficult to maintain. Large language models (LLMs) show promise in assisting with screening and data extraction, but more work is needed to achieve the high accuracy required for evidence that informs clinical and policy decisions.

Objective: The study evaluated the effectiveness of an LLM (GPT-4o) in title and abstract screening compared with human reviewers.

Methods: Human decisions from an LSR on prodopaminergic interventions for anhedonia served as the reference standard. The baseline search results were divided into a development and a test set. Prompts guiding the LLM's eligibility assessments were refined using the development set and evaluated on the test set and two subsequent LSR updates. Consistency of the LLM outputs was also assessed.

Results: Prompt development required 1045 records. When applied to the remaining baseline 11 939 records and two updates, the refined prompts achieved 100% sensitivity for studies ultimately included in the review after full-text screening, though sensitivity for records included by humans at the title and abstract stage varied (58-100%) across updates. Simulated workload reductions of 65-85% were observed. Prompt decisions showed high consistency, with minimal false exclusions, satisfying established screening performance benchmarks for systematic reviews.

Conclusions: Refined GPT-4o prompts demonstrated high sensitivity and moderate specificity while reducing human workload. This approach shows potential for integrating LLMs into systematic review workflows to enhance efficiency.

Keywords: Data Interpretation, Statistical; Machine Learning; PSYCHIATRY.

PubMed Disclaimer

Conflict of interest statement

Competing interests: AH, JT, JK, CF, PC, CM, AR, YK, KY, YY, ST, ĐB, EK, JP and GS: None. EGO received research and consultancy fees from Angelini Pharma. MH is a part-time employee of Get. On Institut GmbH/HelloBetter, a company that implements digital therapeutics into routine care. SL in the last three years has received honoraria for advising/consulting and/or for lectures and/or for educational material from Angelini, Apsen, Boehringer Ingelheim, Eisai, Ekademia, GedeonRichter, Janssen, Karuna, Kynexis, Lundbeck, Medichem, Medscape, Mitsubishi, Neurotorium, Otsuka, NovoNordisk, Recordati, Rovi and Teva. TT is a part-time employee of Fitting Cloud, outside of the submitted work. RS is an employee of CureApp. RS reports grants from Osake-no-Kagaku Foundation, the Mental Health Okamoto Memorial Foundation, Kobayashi Magobe Memorial Medical Foundation, personal fees from Otsuka Pharmaceutical, Nippon Shinyaku, Takeda Pharmaceutical and Sumitomo Pharma outside this work; In addition, RS has a patent JP2022049590A, US20220084673A1 pending, a patent JP2022178215A pending, a patent JP2022070086 pending and a patent JP2023074128A pending. YT reports grants from Japan Society for the Promotion of Science, Kyoto University and Pfizer Foundation, outside of the submitted work. In addition, YT is a board member of Cochrane Japan and works as a physician at Oku Medical Clinic. MS is employed in the department of neurodevelopmental disorders, Nagoya City University Graduate School of Medicine, which is an endowment department supported by the City of Nagoya and has received a personal fee from SONY outside of the submitted work. TAF reports personal fees from Boehringer-Ingelheim, Daiichi Sankyo, DT Axis, Micron, Shionogi, SONY and UpToDate, and a grant from DT Axis and Shionogi, outside of the submitted work. In addition, TAF has a patent 7448125 and a pending patent 2022-082495 and has licensed intellectual properties for Kokoro-app to DT Axis. AC has received research, educational and consultancy fees from the Italian Network for Paediatric Trials, CARIPLO Foundation, Lundbeck and Angelini Pharma, outside of the submitted work.

Figures

Figure 1
Figure 1. Creation of development and test sets of records in the baseline review.

References

    1. Elliott JH, Turner T, Clavisi O, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med. 2014;11:e1001603. doi: 10.1371/journal.pmed.1001603. - DOI - PMC - PubMed
    1. Moher D, Liberati A, Tetzlaff J, et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med. 2009;151:264–9. doi: 10.7326/0003-4819-151-4-200908180-00135. - DOI - PubMed
    1. Borah R, Brown AW, Capers PL, et al. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7:e012545. doi: 10.1136/bmjopen-2016-012545. - DOI - PMC - PubMed
    1. Simmonds M, Elliott JH, Synnot A, et al. Living Systematic Reviews. Methods Mol Biol. 2022;2345:121–34. doi: 10.1007/978-1-0716-1566-9_7. - DOI - PubMed
    1. Cowie K, Rahmatullah A, Hardy N, et al. Web-Based Software Tools for Systematic Literature Review in Medicine: Systematic Search and Feature Analysis. JMIR Med Inform. 2022;10:e33219. doi: 10.2196/33219. - DOI - PMC - PubMed

Publication types

MeSH terms