Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Dec:52:112-20.
doi: 10.1016/j.jbi.2014.01.009. Epub 2014 Feb 1.

Clustering clinical trials with similar eligibility criteria features

Affiliations

Clustering clinical trials with similar eligibility criteria features

Tianyong Hao et al. J Biomed Inform. 2014 Dec.

Abstract

Objectives: To automatically identify and cluster clinical trials with similar eligibility features.

Methods: Using the public repository ClinicalTrials.gov as the data source, we extracted semantic features from the eligibility criteria text of all clinical trials and constructed a trial-feature matrix. We calculated the pairwise similarities for all clinical trials based on their eligibility features. For all trials, by selecting one trial as the center each time, we identified trials whose similarities to the central trial were greater than or equal to a predefined threshold and constructed center-based clusters. Then we identified unique trial sets with distinctive trial membership compositions from center-based clusters by disregarding their structural information.

Results: From the 145,745 clinical trials on ClinicalTrials.gov, we extracted 5,508,491 semantic features. Of these, 459,936 were unique and 160,951 were shared by at least one pair of trials. Crowdsourcing the cluster evaluation using Amazon Mechanical Turk (MTurk), we identified the optimal similarity threshold, 0.9. Using this threshold, we generated 8806 center-based clusters. Evaluation of a sample of the clusters by MTurk resulted in a mean score 4.331±0.796 on a scale of 1-5 (5 indicating "strongly agree that the trials in the cluster are similar").

Conclusions: We contribute an automated approach to clustering clinical trials with similar eligibility features. This approach can be potentially useful for investigating knowledge reuse patterns in clinical trial eligibility criteria designs and for improving clinical trial recruitment. We also contribute an effective crowdsourcing method for evaluating informatics interventions.

Keywords: Clinical trial; Cluster analysis; Medical informatics.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The framework for automatically identifying clinical trial clusters based on eligibility criteria similarity
Fig. 2
Fig. 2
Center-based clusters and unique clusters constructed from four example trials
Fig. 3
Fig. 3
The user interface of HIT designed for MTurks for cluster evaluation
Fig. 4
Fig. 4
The eligibility criteria of trials in an example cluster for comparison by workers
Fig. 5
Fig. 5
Percentage of unique features shared by at least two trials as a function of total number of trials
Fig. 6
Fig. 6
The dynamically generated network diagram of all “Breast Cancer” related trials

References

    1. Bollier D. The Promise and Peril of Big Data. 0-89843-516-1. 2010.The Aspen Institute.
    1. Miotto R, Jiang S, Weng C. eTACTS: A Method for Dynamically Filtering Clinical Trial Search Results. J Biomed Inform. 2013 Dec;46(6):1060–1067. - PMC - PubMed
    1. Patel C, Gomadam K, Khan S, Garg V. TrialX: Using Semantic Technologies to Match Patients to Relevant Clinical Trials Based on Their Personal Health Records. Web Semantics: Science, Services and Agents on the World Wide Web. 2010 Nov;8(4):342–347.
    1. Campbell MK, Snowdon C, Francis D, et al. Recruitment to Randomised Trials: Strategies for Trial Enrollment and Participation Study. The STEPS study. Health Technol Assess. 2007 Nov;11(48):iii, ix–105. - PubMed
    1. Tu SW, Peleg M, Carini S, Bobak M, Ross J, Rubin D, Sim I. A Practical Method for Transforming Free-text Eligibility Criteria into Computable Criteria. J Biomed Inform. 2011 Apr;44(2):239–250. - PMC - PubMed

Publication types