Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun:9:e2500071.
doi: 10.1200/CCI-25-00071. Epub 2025 Jun 9.

Enhancing Patient-Trial Matching With Large Language Models: A Scoping Review of Emerging Applications and Approaches

Affiliations

Enhancing Patient-Trial Matching With Large Language Models: A Scoping Review of Emerging Applications and Approaches

Hongyu Chen et al. JCO Clin Cancer Inform. 2025 Jun.

Abstract

Purpose: Patient recruitment remains a major bottleneck in clinical trial execution, with inefficient patient-trial matching often causing delays and failures. Recent advancements in large language models (LLMs) offer a promising avenue for automating and improving this process. This scoping review aims to provide a comprehensive synthesis of the emerging applications of LLMs in patient-trial matching.

Methods: A comprehensive search was conducted in PubMed, Web of Science, and OpenAlex for literature published between December 1, 2022, and December 31, 2024. Studies were included if they explicitly integrated LLMs into patient-trial matching systems. Data extraction focused on system architectures, patient data processing, eligibility criteria processing, matching techniques, evaluation metrics, and performance.

Results: Of the 2,357 studies initially identified, 24 met the inclusion criteria. The majority (21/24) were published in 2024, highlighting the rapid adoption of LLMs in this domain. Most systems used patient-centric matching (17/24), with OpenAI's generative pretrained transformer models being the most commonly used LLM. Core components of these systems included eligibility criteria processing, patient data processing, and matching, with some incorporating retrieval algorithms to enhance computational efficiency. LLM-integrated approaches demonstrated improved accuracy and scalability in patient-trial matching, although challenges such as performance variability, interpretability, and reliance on synthetic data sets remain significant.

Conclusion: LLM-based patient-trial matching systems present a transformative opportunity to enhance the efficiency and accuracy of clinical trial recruitment. Despite current limitations related to model generalizability, explainability, and data constraints, future advancements in hybrid modeling strategies, domain-specific fine-tuning, and real-world data set integration could further optimize LLM-based trial matching. Addressing these challenges will be crucial to realizing the full potential of LLMs in streamlining patient recruitment and accelerating clinical trial execution.

PubMed Disclaimer

Conflict of interest statement

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

James McGill

Employment: Lilly

Leadership: Lilly

Stock and Other Ownership Interests: Lilly

Consulting or Advisory Role: Paratus Sciences Corporation, PromiseBio, Iterative Health

Research Funding: Lilly

Travel, Accommodations, Expenses: Versiti

Emily C. Webber

Leadership: Indiana University Health

Travel, Accommodations, Expenses: American Board of Preventative Medicine

Hua Xu

Stock and Other Ownership Interests: More Health, Melax Tech Inc

Consulting or Advisory Role: IMO Inc

Patents, Royalties, Other Intellectual Property: Software license income from University of Texas Health Science Center at Houston

No other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
The PRISMA diagram depicts the number of records identified, included and excluded, and the reasons for exclusion. BERT, Bidirectional Encoder Representations from Transformers; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Scoping Reviews.
FIG 2.
FIG 2.
Popular components of patient-trial matching system. The icons were designed by Freepik.
FIG 3.
FIG 3.
The common processing flow for different data types. EHR, electronic health record.
FIG 4.
FIG 4.
General architectures of matching components. (A) Matching patients and trials through a single question; (B) matching patients and trials using a series of decomposed questions; (C) generating embeddings as representations for patients or trials to predict matching outcomes; (D) directly matching patients and trials based on extracted keywords. The icons were designed by Freepik. BERT, Bidirectional Encoder Representations from Transformers; KD, Kawasaki disease; LLMs, large language models.
FIG 5.
FIG 5.
Common techniques and flows in the matching module. The matching results can be either a class (ie, eligible or not) or a ranking. Matching can occur in a single step or multiple steps. Single-step approaches include executing a query (query), using a neural network to classify based on embeddings (embedding), classifying with an LLM (classification), or scoring with LLMs or retrieval algorithms (scoring). Multistep methods involve a combination of processes such as classification, scoring, and retrieval (retrieval); ensembling multiple classification results for a single sample (ensembling); aggregating unit-level predictions to derive patient-trial–level outcomes (aggregation); and performing exact keyword matching between eligibility criteria and patient profiles (exact matching). LLMs, large language models.

References

    1. Lange S, Sauerland S, Lauterberg J, et al. : The range and scientific value of randomized trials. Dtsch Arztebl Int 114:635-640, 2017 - PMC - PubMed
    1. Hariton E, Locascio JJ: Randomised controlled trials—The gold standard for effectiveness research: Study design: Randomised controlled trials. BJOG 125:1716, 2018 - PMC - PubMed
    1. Izarn F, Henry J, Besle S, et al. : Globalization of clinical trials in oncology: A worldwide quantitative analysis. ESMO Open 10:104086, 2025 - PMC - PubMed
    1. McDonald AM, Knight RC, Campbell MK, et al. : What influences recruitment to randomised controlled trials? A review of trials funded by two UK funding agencies. Trials 7:9, 2006 - PMC - PubMed
    1. Nipp RD, Hong K, Paskett ED: Overcoming barriers to clinical trial enrollment. Am Soc Clin Oncol Educ Book 39:105-114, 2019 - PubMed

Publication types