Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul:7:e2300009.
doi: 10.1200/CCI.23.00009.

Automated Matching of Patients to Clinical Trials: A Patient-Centric Natural Language Processing Approach for Pediatric Leukemia

Affiliations

Automated Matching of Patients to Clinical Trials: A Patient-Centric Natural Language Processing Approach for Pediatric Leukemia

Samuel Kaskovich et al. JCO Clin Cancer Inform. 2023 Jul.

Abstract

Purpose: Matching patients to clinical trials is cumbersome and costly. Attempts have been made to automate the matching process; however, most have used a trial-centric approach, which focuses on a single trial. In this study, we developed a patient-centric matching tool that matches patient-specific demographic and clinical information with free-text clinical trial inclusion and exclusion criteria extracted using natural language processing to return a list of relevant clinical trials ordered by the patient's likelihood of eligibility.

Materials and methods: Records from pediatric leukemia clinical trials were downloaded from ClinicalTrials.gov. Regular expressions were used to discretize and extract individual trial criteria. A multilabel support vector machine (SVM) was trained to classify sentence embeddings of criteria into relevant clinical categories. Labeled criteria were parsed using regular expressions to extract numbers, comparators, and relationships. In the validation phase, a patient-trial match score was generated for each trial and returned in the form of a ranked list for each patient.

Results: In total, 5,251 discretized criteria were extracted from 216 protocols. The most frequent criterion was previous chemotherapy/biologics (17%). The multilabel SVM demonstrated a pooled accuracy of 75%. The text processing pipeline was able to automatically extract 68% of eligibility criteria rules, as compared with 80% in a manual version of the tool. Automated matching was accomplished in approximately 4 seconds, as compared with several hours using manual derivation.

Conclusion: To our knowledge, this project represents the first open-source attempt to generate a patient-centric clinical trial matching tool. The tool demonstrated acceptable performance when compared with a manual version, and it has potential to save time and money when matching patients to trials.

PubMed Disclaimer

Conflict of interest statement

The following represents disclosure information provided by authors of this manuscript. All relationships are considered compensated unless otherwise noted. Relationships are self-held unless noted. I = Immediate Family Member, Inst = My Institution. Relationships may not relate to the subject matter of this manuscript. For more information about ASCO's conflict of interest policy, please refer to www.asco.org/rwc or ascopubs.org/cci/author-center.

Open Payments is a public database containing information reported by companies about payments made to US-licensed physicians (Open Payments).

Brian Furner

Stock and Other Ownership Interests: United Therapeutics

Samuel L. Volchenboum

Stock and Other Ownership Interests: Litmus Health

Consulting or Advisory Role: Accordant, Westat

Travel, Accommodations, Expenses: Sanford Health

No other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
Comparison of trial-centric and patient-centric matching approaches. Trial-centric matching attempts to identify one or more patients who are eligible to participate in a clinical trial. By contrast, patient-centric matching attempts to identify one or more clinical trials a patient may be eligible for. The directional one-to-many asymmetry in patient-centric matching approaches may lend to greater patient choice.
FIG 2.
FIG 2.
Eligibility criteria excerpt from extensible markup language file from ClinicalTrials.gov identifier: NCT02883049.
FIG 3.
FIG 3.
Process diagram with descriptive results of text processing. SVM, support vector machine.
FIG 4.
FIG 4.
Comparison of feature extraction between automated and manual tools. ECOG, Eastern Cooperative Oncology Group.

References

    1. Huang GD, Bull J, Johnston McKee K, et al. : Clinical trials recruitment planning: A proposed framework from the Clinical Trials Transformation Initiative. Contemp Clin Trials 66:74-79, 2018 - PubMed
    1. Sacks LV, Shamsuddin HH, Yasinskaya YI, et al. : Scientific and regulatory reasons for delay and denial of FDA approval of initial applications for new drugs, 2000-2012. JAMA 311:378-384, 2014 - PubMed
    1. Ross JS, Dzara K, Downing NS: Efficacy and safety concerns are important reasons why the FDA requires multiple reviews before approval of new drugs. Health Aff 34:681-688, 2015 - PubMed
    1. Harrer S, Shah P, Antony B, et al. : Artificial intelligence for clinical trial design. Trends Pharmacol Sci 40:577-591, 2019 - PubMed
    1. Ni Y, Wright J, Perentesis J, et al. : Increasing the efficiency of trial-patient matching: Automated clinical trial eligibility pre-screening for pediatric oncology patients. BMC Med Inform Decis Mak 15:28, 2015 - PMC - PubMed

Publication types