Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 12;13(1):613.
doi: 10.1038/s41598-023-27856-1.

Machine learning enabled subgroup analysis with real-world data to inform clinical trial eligibility criteria design

Affiliations

Machine learning enabled subgroup analysis with real-world data to inform clinical trial eligibility criteria design

Jie Xu et al. Sci Rep. .

Abstract

Overly restrictive eligibility criteria for clinical trials may limit the generalizability of the trial results to their target real-world patient populations. We developed a novel machine learning approach using large collections of real-world data (RWD) to better inform clinical trial eligibility criteria design. We extracted patients' clinical events from electronic health records (EHRs), which include demographics, diagnoses, and drugs, and assumed certain compositions of these clinical events within an individual's EHRs can determine the subphenotypes-homogeneous clusters of patients, where patients within each subgroup share similar clinical characteristics. We introduced an outcome-guided probabilistic model to identify those subphenotypes, such that the patients within the same subgroup not only share similar clinical characteristics but also at similar risk levels of encountering severe adverse events (SAEs). We evaluated our algorithm on two previously conducted clinical trials with EHRs from the OneFlorida+ Clinical Research Consortium. Our model can clearly identify the patient subgroups who are more likely to suffer or not suffer from SAEs as subphenotypes in a transparent and interpretable way. Our approach identified a set of clinical topics and derived novel patient representations based on them. Each clinical topic represents a certain clinical event composition pattern learned from the patient EHRs. Tested on both trials, patient subgroup (#SAE=0) and patient subgroup (#SAE>0) can be well-separated by k-means clustering using the inferred topics. The inferred topics characterized as likely to align with the patient subgroup (#SAE>0) revealed meaningful combinations of clinical features and can provide data-driven recommendations for refining the exclusion criteria of clinical trials. The proposed supervised topic modeling approach can infer the clinical topics from the subphenotypes with or without SAEs. The potential rules for describing the patient subgroups with SAEs can be further derived to inform the design of clinical trial eligibility criteria.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Model overview. Demographics, diagnoses, and medications were extracted from RWD to represent patients. Supervised Poisson factor analysis (PFA) was applied to identify patient subgroups with coherent clinical latent topics and outcomes measured by SAEs. Subgroups with SAEs can be derived to inform the design of clinical trial exclusion criteria.
Figure 2
Figure 2
Donepezil clinical trial. (a) Definition of key dates. (b) Selection of target population. Each sample is colored based on whether the patient had SAEs or not. (c) Traits distribution with UMAP.
Figure 3
Figure 3
Bevacizumab clinical trial. (a) Definition of key dates. (b) Selection of target population. Each sample is colored based on whether the patient had SAEs or not. (c) Traits distribution with UMAP.
Figure 4
Figure 4
Clustering results of the AD target population. (a) Visualization of clustering results. (b) Mean topic weight (MTW) of all topics on two groups, where the x-axis is the topic index and the y-axis is the MTW of each topic on two subgroups. (c) Top features from certain disease topics. The right sidebar of each topic shows the percentage of patients with the corresponding feature in that topic.
Figure 5
Figure 5
Clustering results of the CRC target population. (a) Visualization of clustering results. (b) Mean topic weight (MTW) of all topics on two groups, where the x-axis is the topic index and the y-axis is the MTW of each topic on two subgroups. (c) Top features from certain disease topics. The right sidebar of each topic shows the percentage of patients with the corresponding feature in that topic.

References

    1. Collins, F. S. The Importance of Clinical Trials. NIH MedlinePlushttps://www.nih.gov/sites/default/files/about-nih/nih-director/articles/... (2011). Accessed Sep 15, 2020.
    1. Rothwell PM. External validity of randomised controlled trials: To whom do the results of this trial apply? Lancet. 2005;365:82–93. doi: 10.1016/S0140-6736(04)17670-8. - DOI - PubMed
    1. Smits M, et al. Exploring the causes of adverse events in hospitals and potential prevention strategies. Qual. Saf. Health Care. 2010;19:e5–e5. - PubMed
    1. Ory M, et al. Screening, safety, and adverse events in physical activity interventions: Collaborative experiences from the behavior change consortium. Ann. Behav. Med. 2005;29:20–28. doi: 10.1207/s15324796abm2902s_5. - DOI - PubMed
    1. Li, Q. et al. Assessing the validity of aa priori patient-trial generalizability score using real-world data from a large clinical data research network: a colorectal cancer clinical trial case study. In AMIA Annual Symposium Proceedings, vol. 2019, 1101 (American Medical Informatics Association, 2019). - PMC - PubMed

Publication types