Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 18;15(1):9074.
doi: 10.1038/s41467-024-53081-z.

Matching patients to clinical trials with large language models

Affiliations

Matching patients to clinical trials with large language models

Qiao Jin et al. Nat Commun. .

Abstract

Patient recruitment is challenging for clinical trials. We introduce TrialGPT, an end-to-end framework for zero-shot patient-to-trial matching with large language models. TrialGPT comprises three modules: it first performs large-scale filtering to retrieve candidate trials (TrialGPT-Retrieval); then predicts criterion-level patient eligibility (TrialGPT-Matching); and finally generates trial-level scores (TrialGPT-Ranking). We evaluate TrialGPT on three cohorts of 183 synthetic patients with over 75,000 trial annotations. TrialGPT-Retrieval can recall over 90% of relevant trials using less than 6% of the initial collection. Manual evaluations on 1015 patient-criterion pairs show that TrialGPT-Matching achieves an accuracy of 87.3% with faithful explanations, close to the expert performance. The TrialGPT-Ranking scores are highly correlated with human judgments and outperform the best-competing models by 43.8% in ranking and excluding trials. Furthermore, our user study reveals that TrialGPT can reduce the screening time by 42.6% in patient recruitment. Overall, these results have demonstrated promising opportunities for patient-to-trial matching with TrialGPT.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The overall architecture of TrialGPT.
a TrialGPT-Retrieval can filter out most of the irrelevant trials from the initial collection and return a list of candidate clinical trials; b For a given patient, TrialGPT-Matching can explain the relevance, generate the evident sentence locations, and predict the eligibility classification for each criterion in a trial; c TrialGPT-Ranking can aggregate the criterion-level predictions by TrialGPT-Matching and use these scores to perform fine-grained ranking to get the final recommended trials.
Fig. 2
Fig. 2. First-stage retrieval results.
a Overview of TrialGPT-Retrieval. LLMs first generate a list of keywords for a given patient note. These keywords are used to derive the keyword-level relevant clinical trials, which are then fused to generate a final ranking. b Recalls of relevant clinical trials at different depths for various query types and retrievers. The hybrid retriever combines the results of the BM25 (lexical matching) and the MedCPT (semantic matching) retrievers. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Manual evaluations of criterion-level predictions by GPT-4-based TrialGPT-Matching.
a The percentage of correct, partially correct, and incorrect relevance explanations generated by TrialGPT-Matching; b Evaluation results of the relevant sentences located by TrialGPT-Matching based on 405 criterion-level annotations where at least one sentence is labeled as relevant in the ground-truth. 95% confidence intervals estimated by bootstrapping are shown as error bars; c The confusion matrices of the eligibility for inclusion criteria predicted by human experts and TrialGPT-Matching; d The confusion matrices of the eligibility for exclusion criteria predicted by human experts and TrialGPT-Matching. Not Incl.: Not included. Not Excl.: Not excluded. No Info.: Not enough information. Not Appl.: Not applicable. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Correlation between differently aggregated TrialGPT scores and the ground-truth patient-trial eligibility labels.
a The percentage of inclusion criteria predicted as “included” by TrialGPT; b The percentage of inclusion criteria predicted as “not included”; c The percentage of inclusion criteria predicted as “no relevant information”; d The LLM-aggregated relevance score; e The percentage of exclusion criteria predicted as “excluded”; f The percentage of exclusion criteria predicted as “not excluded”; g The percentage of exclusion criteria predicted as “no relevant information”; h The LLM-aggregated eligibility score. “*” denotes p < 0.05, “**” denotes p < 0.01, “***” denotes p < 0.001, and “n.s.” denotes not significant (p > 0.05) by two-sided independent t-test. There are 60,240 unlabeled, 15,459 irrelevant, 6981 excluded, 647 potential, and 8173 eligible patient-trial pairs. Center line, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. Source data are provided as a Source Data file, including all the exact p values.
Fig. 5
Fig. 5. Results of the patient-trial matching user study.
a Experimental design and actual screening times of each patient-trial pair by two expert annotators; b Comparison of screening time aggregated by different clinical trials; c Comparison of screening time aggregated by different patient cases; d Comparison of screening time aggregated by short cases, long cases, annotators, and all pairs. Numbers in parentheses denote the sample sizes in the corresponding group of the comparison. Within the annotator (e.g., Annotator X or Y), significant tests are conducted by two-sided independent t-test. Other significant tests are two-sided paired t-tests. Trial A, B, C, D, E, and F denote NCT04432597, NCT05012098, NCT04287868, NCT04847466, NCT04719988, and NCT04894370, respectively. Source data are provided as a Source Data file.

Update of

References

    1. Kadam, R. A., Borde, S. U., Madas, S. A., Salvi, S. S. & Limaye, S. S. Challenges in recruitment and retention of clinical trial subjects. Perspect. Clin. Res7, 137–143 (2016). - PMC - PubMed
    1. Bennette, C. S. et al. Predicting low accrual in the National Cancer Institute’s cooperative group clinical trials. JNCI: J. Natl Cancer Inst.108, djv324 (2016). - PMC - PubMed
    1. Haddad, T. C. et al. Impact of a cognitive computing clinical trial matching system in an ambulatory oncology practice (American Society of Clinical Oncology, 2018).
    1. Woo, M. An AI boost for clinical trials. Nature573, S100–S102 (2019). - PubMed
    1. Hutson, M. How AI is being used to accelerate clinical trials. Nature627, S2–S5 (2024). - PubMed

Publication types