Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov;28(11):1630-1636.
doi: 10.1016/j.jval.2025.09.008. Epub 2025 Sep 23.

Validating Loon Lens 1.0 for Autonomous Abstract Screening and Confidence-Guided Human-in-the-Loop Workflows in Systematic Reviews

Affiliations

Validating Loon Lens 1.0 for Autonomous Abstract Screening and Confidence-Guided Human-in-the-Loop Workflows in Systematic Reviews

Ghayath Janoudi et al. Value Health. 2025 Nov.

Abstract

Objectives: Title and abstract screening is a labor-intensive step in systematic literature reviews (SLRs). We examine the performance of Loon Lens 1.0, an agentic artificial intelligence platform for autonomous title and abstract screening and test whether its confidence scores can target minimal human oversight.

Methods: A total of 8 SLRs by Canada's Drug Agency were rescreened through dual human reviewers and adjudicated process (3796 citations, 287 includes, 7.6%) and separately by Loon Lens, based on predefined eligibility criteria. Accuracy, sensitivity, precision, and specificity were measured and bootstrapped to generate 95% confidence intervals. Logistic regression with (1) confidence alone and (2) confidence + Include/Exclude decision predicted errors and informed simulated human-in-the-loop strategies.

Results: Loon Lens achieved 95.5% accuracy (95% CI 94.8-96.1), 98.9% sensitivity (97.6-100), 95.2% specificity (94.5-95.9), and 63.0% precision (58.4-67.3). Errors clustered in Low-Medium-confidence Includes. The extended logistic regression model (confidence + decision; C-index 0.98) estimated a 75% error probability for Low-confidence Includes versus <0.1% for Very-High-confidence Excludes. Simulated human-in-the-loop review of Low + Medium-confidence Includes only (145 citations, 3.8%), lifted precision to 81.4% and overall accuracy to 98.2% while preserving sensitivity (99.0%). Adding High-confidence Includes (221 citations, 5.8%) pushed precision to 89.9% and accuracy to 99.0%.

Conclusions: Across 8 SLRs (3796 citations), Loon Lens 1.0 reproduced adjudicated human screening with 98.9% sensitivity and 95.2% specificity. In simulation, restricting human-in-the-loop review to ≤5.8% of citations by prioritizing include calls below very-high confidence, reduced false positives and increased precision to 89.9% while maintaining sensitivity and raising overall accuracy to 99.0%. These findings indicate that confidence-guided oversight can concentrate reviewer effort on a small subset of records.

Keywords: artificial intelligence; health technology assessment; large language model; literature screening; systematic review.

PubMed Disclaimer

Conflict of interest statement

Author Disclosures Author disclosure forms can be accessed below in the Supplemental Material section.

Publication types

LinkOut - more resources