Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 25;3(6):e70059.
doi: 10.1002/cesm.70059. eCollection 2025 Nov.

Human-in-the-Loop Artificial Intelligence System for Systematic Literature Review: Methods and Validations for the AutoLit Review Software

Affiliations

Human-in-the-Loop Artificial Intelligence System for Systematic Literature Review: Methods and Validations for the AutoLit Review Software

Kevin M Kallmes et al. Cochrane Evid Synth Methods. .

Abstract

Introduction: While artificial intelligence (AI) tools have been utilized for individual stages within the systematic literature review (SLR) process, no tool has previously been shown to support each critical SLR step. In addition, the need for expert oversight has been recognized to ensure the quality of SLR findings. Here, we describe a complete methodology for utilizing our AI SLR tool with human-in-the-loop curation workflows, as well as AI validations, time savings, and approaches to ensure compliance with best review practices.

Methods: SLRs require completing Search, Screening, and Extraction from relevant studies, with meta-analysis and critical appraisal as relevant. We present a full methodological framework for completing SLRs utilizing our AutoLit software (Nested Knowledge). This system integrates AI models into the central steps in SLR: Search strategy generation, Dual Screening of Titles/Abstracts and Full Texts, and Extraction of qualitative and quantitative evidence. The system also offers manual Critical Appraisal and Insight drafting and fully-automated Network Meta-analysis. Validations comparing AI performance to experts are reported, and where relevant, time savings and 'rapid review' alternatives to the SLR workflow.

Results: Search strategy generation with the Smart Search AI can turn a Research Question into full Boolean strings with 76.8% and 79.6% Recall in two validation sets. Supervised machine learning tools can achieve 82-97% Recall in reviewer-level Screening. Population, Interventions/Comparators, and Outcomes (PICOs) extraction achieved F1 of 0.74; accuracy for study type, location, and size were 74%, 78%, and 91%, respectively. Time savings of 50% in Abstract Screening and 70-80% in qualitative extraction were reported. Extraction of user-specified qualitative and quantitative tags and data elements remains exploratory and requires human curation for SLRs.

Conclusion: AI systems can support high-quality, human-in-the-loop execution of key SLR stages. Transparency, replicability, and expert oversight are central to the use of AI SLR tools.

Keywords: artificial intelligence; evidence synthesis; human‐in‐the‐loop; meta‐analysis; systematic literature review.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Search Exploration predicts and maps Populations, Interventions, Comparators, and Outcomes from underlying abstracts resulting from Boolean PubMed queries, including reporting frequency and enabling adoption of new search terms.
Figure 2
Figure 2
The Inclusion Prediction Model learns from expert screening on a project‐by‐project basis, with transparent fivefold cross‐validation statistics for researchers to review regularly. a (above): Example of the cross‐validation statistics from an example project in AutoLit. b (below): Robot Screener decisions displayed in the Adjudication workflow, with disagreements between human and Robot decisions available to be adopted or overridden by the independent adjudicator.
Figure 3
Figure 3
Core Smart Tags extracts preset elements (PICOs, Study Location, Type, Size) and builds the project's data extraction template/tag hierarchy. These hierarchies can be edited and expanded by the user, such as adding tags for specific sub‐categories, administrative annotation, and concepts outside of PICOs, Study Location, Type, and Size.
Figure 4
Figure 4
Adaptive Smart Tags extracts evidence from abstracts or Full Texts (pictured here) based on custom, project‐specific Questions. Contents extracted are highlighted for full traceability, and can be text, numeric, options/drop‐down, or tabular data (pictured here).
Figure 5
Figure 5
Forest Plot from automated NMA of data extracted in the Meta‐analytical Module in AutoLit. Forest plots, Funnel plots, I‐squared calculations, SUCRA Rankings, and Odds Ratios with 95% Confidence Intervals, as well as the Network Diagram for hierarchical meta‐analysis, are automatically generated in the Quantitative Synthesis module.

References

    1. Ge L., Agrawal R., Singer M., et al., “Leveraging Artificial Intelligence to Enhance Systematic Reviews in Health Research: Advanced Tools and Challenges,” Systematic Reviews 13 (2024): 269, 10.1186/s13643-024-02682-2. - DOI - PMC - PubMed
    1. Lieberum J. L., Toews M., Metzendorf M. I., et al., “Large Language Models for Conducting Systematic Reviews: on the Rise, but Not yet Ready for Use‐A Scoping Review,” Journal of Clinical Epidemiology 181 (2025): 111746, 10.1016/j.jclinepi.2025.111746. - DOI - PubMed
    1. Amann J., Blasimme A., Vayena E., Frey D., and Madai V. I., “Explainability for Artificial Intelligence in Healthcare: A Multidisciplinary Perspective,” BMC Medical Informatics and Decision Making 20, no. 1 (2020): 310, 10.1186/s12911-020-01332-6. - DOI - PMC - PubMed
    1. National Institute for Health and Care Excellence (NICE). Use of AI in Evidence Generation– NICE Position Statement. NICE. Published October 2023, accessed May 29, 2025, https://www.nice.org.uk/about/what-we-do/our-research-work/use-of-ai-in-....
    1. Rycroft C. E., Fernandez M., and Copley‐Merriman C., “Systematic Literature Reviews at the Heart of Health Technology Assessment: A Comparison Across Markets,” Value in Health 16, no. 7 (2013): A481, 10.1016/j.jval.2013.08.1236. - DOI

LinkOut - more resources