. 2024 Nov 18;15(1):9074.

doi: 10.1038/s41467-024-53081-z.

Matching patients to clinical trials with large language models

Qiao Jin¹, Zifeng Wang², Charalampos S Floudas³, Fangyuan Chen⁴, Changlin Gong⁵, Dara Bracken-Clarke³, Elisabetta Xue³, Yifan Yang^{1

6}, Jimeng Sun², Zhiyong Lu⁷

Affiliations

¹ National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, USA.
² Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA.
³ Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, USA.
⁴ School of Medicine, University of Pittsburgh, Pittsburgh, USA.
⁵ Jacob Medical Center, Albert Einstein College of Medicine, Bronx, USA.
⁶ School of Computer Science, University of Maryland College Park, Maryland, USA.
⁷ National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, USA. zhiyong.lu@nih.gov.

PMID: 39557832
PMCID: PMC11574183
DOI: 10.1038/s41467-024-53081-z

Matching patients to clinical trials with large language models

Qiao Jin et al. Nat Commun. 2024.

. 2024 Nov 18;15(1):9074.

doi: 10.1038/s41467-024-53081-z.

Authors

Qiao Jin¹, Zifeng Wang², Charalampos S Floudas³, Fangyuan Chen⁴, Changlin Gong⁵, Dara Bracken-Clarke³, Elisabetta Xue³, Yifan Yang^{1

6}, Jimeng Sun², Zhiyong Lu⁷

Affiliations

¹ National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, USA.
² Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL, USA.
³ Center for Immuno-Oncology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, USA.
⁴ School of Medicine, University of Pittsburgh, Pittsburgh, USA.
⁵ Jacob Medical Center, Albert Einstein College of Medicine, Bronx, USA.
⁶ School of Computer Science, University of Maryland College Park, Maryland, USA.
⁷ National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, USA. zhiyong.lu@nih.gov.

PMID: 39557832
PMCID: PMC11574183
DOI: 10.1038/s41467-024-53081-z

Abstract

Patient recruitment is challenging for clinical trials. We introduce TrialGPT, an end-to-end framework for zero-shot patient-to-trial matching with large language models. TrialGPT comprises three modules: it first performs large-scale filtering to retrieve candidate trials (TrialGPT-Retrieval); then predicts criterion-level patient eligibility (TrialGPT-Matching); and finally generates trial-level scores (TrialGPT-Ranking). We evaluate TrialGPT on three cohorts of 183 synthetic patients with over 75,000 trial annotations. TrialGPT-Retrieval can recall over 90% of relevant trials using less than 6% of the initial collection. Manual evaluations on 1015 patient-criterion pairs show that TrialGPT-Matching achieves an accuracy of 87.3% with faithful explanations, close to the expert performance. The TrialGPT-Ranking scores are highly correlated with human judgments and outperform the best-competing models by 43.8% in ranking and excluding trials. Furthermore, our user study reveals that TrialGPT can reduce the screening time by 42.6% in patient recruitment. Overall, these results have demonstrated promising opportunities for patient-to-trial matching with TrialGPT.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

**Fig. 1. The overall architecture of TrialGPT.**
a TrialGPT-Retrieval can filter out most of the irrelevant trials from the initial collection and return a list of candidate clinical trials; b For a given patient, TrialGPT-Matching can explain the relevance, generate the evident sentence locations, and predict the eligibility classification for each criterion in a trial; c TrialGPT-Ranking can aggregate the criterion-level predictions by TrialGPT-Matching and use these scores to perform fine-grained ranking to get the final recommended trials.

**Fig. 2. First-stage retrieval results.**
a Overview of TrialGPT-Retrieval. LLMs first generate a list of keywords for a given patient note. These keywords are used to derive the keyword-level relevant clinical trials, which are then fused to generate a final ranking. b Recalls of relevant clinical trials at different depths for various query types and retrievers. The hybrid retriever combines the results of the BM25 (lexical matching) and the MedCPT (semantic matching) retrievers. Source data are provided as a Source Data file.

**Fig. 3. Manual evaluations of criterion-level predictions by GPT-4-based TrialGPT-Matching.**
a The percentage of correct, partially correct, and incorrect relevance explanations generated by TrialGPT-Matching; b Evaluation results of the relevant sentences located by TrialGPT-Matching based on 405 criterion-level annotations where at least one sentence is labeled as relevant in the ground-truth. 95% confidence intervals estimated by bootstrapping are shown as error bars; c The confusion matrices of the eligibility for inclusion criteria predicted by human experts and TrialGPT-Matching; d The confusion matrices of the eligibility for exclusion criteria predicted by human experts and TrialGPT-Matching. Not Incl.: Not included. Not Excl.: Not excluded. No Info.: Not enough information. Not Appl.: Not applicable. Source data are provided as a Source Data file.

**Fig. 4. Correlation between differently aggregated TrialGPT scores and the ground-truth patient-trial eligibility labels.**
a The percentage of inclusion criteria predicted as “included” by TrialGPT; b The percentage of inclusion criteria predicted as “not included”; c The percentage of inclusion criteria predicted as “no relevant information”; d The LLM-aggregated relevance score; e The percentage of exclusion criteria predicted as “excluded”; f The percentage of exclusion criteria predicted as “not excluded”; g The percentage of exclusion criteria predicted as “no relevant information”; h The LLM-aggregated eligibility score. “*” denotes p < 0.05, “**” denotes p < 0.01, “***” denotes p < 0.001, and “n.s.” denotes not significant (p > 0.05) by two-sided independent t-test. There are 60,240 unlabeled, 15,459 irrelevant, 6981 excluded, 647 potential, and 8173 eligible patient-trial pairs. Center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. Source data are provided as a Source Data file, including all the exact p values.

**Fig. 5. Results of the patient-trial matching user study.**
a Experimental design and actual screening times of each patient-trial pair by two expert annotators; b Comparison of screening time aggregated by different clinical trials; c Comparison of screening time aggregated by different patient cases; d Comparison of screening time aggregated by short cases, long cases, annotators, and all pairs. Numbers in parentheses denote the sample sizes in the corresponding group of the comparison. Within the annotator (e.g., Annotator X or Y), significant tests are conducted by two-sided independent t-test. Other significant tests are two-sided paired t-tests. Trial A, B, C, D, E, and F denote NCT04432597, NCT05012098, NCT04287868, NCT04847466, NCT04719988, and NCT04894370, respectively. Source data are provided as a Source Data file.

See this image and copyright information in PMC

Update of

Matching Patients to Clinical Trials with Large Language Models.
Jin Q, Wang Z, Floudas CS, Chen F, Gong C, Bracken-Clarke D, Xue E, Yang Y, Sun J, Lu Z. Jin Q, et al. ArXiv [Preprint]. 2024 Nov 18:arXiv:2307.15051v5. ArXiv. 2024. Update in: Nat Commun. 2024 Nov 18;15(1):9074. doi: 10.1038/s41467-024-53081-z. PMID: 37576126 Free PMC article. Updated. Preprint.

References

1. Kadam, R. A., Borde, S. U., Madas, S. A., Salvi, S. S. & Limaye, S. S. Challenges in recruitment and retention of clinical trial subjects. Perspect. Clin. Res7, 137–143 (2016). - DOI - PMC - PubMed
1. Bennette, C. S. et al. Predicting low accrual in the National Cancer Institute’s cooperative group clinical trials. JNCI: J. Natl Cancer Inst.108, djv324 (2016). - PMC - PubMed
1. Haddad, T. C. et al. Impact of a cognitive computing clinical trial matching system in an ambulatory oncology practice (American Society of Clinical Oncology, 2018).
1. Woo, M. An AI boost for clinical trials. Nature573, S100–S102 (2019). - DOI - PubMed
1. Hutson, M. How AI is being used to accelerate clinical trials. Nature627, S2–S5 (2024). - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

NIH Intramural Research Program, National Library of Medicine.

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Matching patients to clinical trials with large language models

Affiliations

Matching patients to clinical trials with large language models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical