Clustering clinical trials with similar eligibility criteria features
- PMID: 24496068
- PMCID: PMC4119097
- DOI: 10.1016/j.jbi.2014.01.009
Clustering clinical trials with similar eligibility criteria features
Abstract
Objectives: To automatically identify and cluster clinical trials with similar eligibility features.
Methods: Using the public repository ClinicalTrials.gov as the data source, we extracted semantic features from the eligibility criteria text of all clinical trials and constructed a trial-feature matrix. We calculated the pairwise similarities for all clinical trials based on their eligibility features. For all trials, by selecting one trial as the center each time, we identified trials whose similarities to the central trial were greater than or equal to a predefined threshold and constructed center-based clusters. Then we identified unique trial sets with distinctive trial membership compositions from center-based clusters by disregarding their structural information.
Results: From the 145,745 clinical trials on ClinicalTrials.gov, we extracted 5,508,491 semantic features. Of these, 459,936 were unique and 160,951 were shared by at least one pair of trials. Crowdsourcing the cluster evaluation using Amazon Mechanical Turk (MTurk), we identified the optimal similarity threshold, 0.9. Using this threshold, we generated 8806 center-based clusters. Evaluation of a sample of the clusters by MTurk resulted in a mean score 4.331±0.796 on a scale of 1-5 (5 indicating "strongly agree that the trials in the cluster are similar").
Conclusions: We contribute an automated approach to clustering clinical trials with similar eligibility features. This approach can be potentially useful for investigating knowledge reuse patterns in clinical trial eligibility criteria designs and for improving clinical trial recruitment. We also contribute an effective crowdsourcing method for evaluating informatics interventions.
Keywords: Clinical trial; Cluster analysis; Medical informatics.
Copyright © 2014 Elsevier Inc. All rights reserved.
Figures
References
-
- Bollier D. The Promise and Peril of Big Data. 0-89843-516-1. 2010.The Aspen Institute.
-
- Patel C, Gomadam K, Khan S, Garg V. TrialX: Using Semantic Technologies to Match Patients to Relevant Clinical Trials Based on Their Personal Health Records. Web Semantics: Science, Services and Agents on the World Wide Web. 2010 Nov;8(4):342–347.
-
- Campbell MK, Snowdon C, Francis D, et al. Recruitment to Randomised Trials: Strategies for Trial Enrollment and Participation Study. The STEPS study. Health Technol Assess. 2007 Nov;11(48):iii, ix–105. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
