Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun:3:1-10.
doi: 10.1200/CCI.19.00037.

Creating a Synthetic Clinical Trial: Comparative Effectiveness Analyses Using an Electronic Medical Record

Affiliations

Creating a Synthetic Clinical Trial: Comparative Effectiveness Analyses Using an Electronic Medical Record

Marjorie G Zauderer et al. JCO Clin Cancer Inform. 2019 Jun.

Abstract

Purpose: Electronic medical records (EMRs) are a vast resource of potentially mineable data that can be used to complement and extend clinical trials. Extracting and analyzing EMR data are impeded by technical complexities associated with large, multiformat databases. We sought to develop and validate a framework that would overcome the difficulties associated with EMR data and create a simple, portable, and expandable system to better use this resource.

Materials and methods: An Internet-accessible program was developed in Python that applied user-defined criteria to identify and extract patient data from Memorial Sloan Kettering databases. A Worker Application composed of individual modules was developed to identify each patient's functional status, smoking status, and treatment classification. The validity of this approach was tested by identifying, extracting, and analyzing data from a patient cohort that paralleled a practice-changing, prospective, randomized phase III clinical trial performed at a different institution. We called this a synthetic clinical trial.

Results: Our synthetic clinical trial identified and extracted data on a cohort of 281 patients with lung cancer who matched inclusion criteria and received their first treatment between October 2003 and July 2010. The data extraction modules were precise and accurate, with F-measures greater than 0.98. Results were similar in directionality and magnitude to the chosen comparator clinical trial.

Conclusion: Our framework offers an accurate and user-friendly interface for identifying and extracting EMR data that can be used to create synthetic clinical trials. Additional studies are needed to validate this approach in other patient cohorts, replicate our findings, and leverage this methodology to improve patient care and accelerate drug development.

PubMed Disclaimer

Conflict of interest statement

Marjorie G. Zauderer

Consulting or Advisory Role: Epizyme, Aldeyra Therapeutics

Research Funding: MedImmune (Inst), Epizyme (Inst), Polaris (Inst), Sellas Life Sciences (Inst), Bristol-Myers Squibb (Inst), Millennium (Inst), Curis (Inst)

Other Relationship: Mesothelioma Applied Research Foundation, Memorial Sloan Kettering Cancer Center, Roche AG (Inst)

Isaac Wagner

Consulting or Advisory Role: Nan Fung Life Sciences

Aryeh Caroline

Honoraria: Health Advances

Mark G. Kris

Consulting or Advisory Role: AstraZeneca, Regeneron, Pfizer

Travel, Accommodations, Expenses: AstraZeneca, Genentech

Other Relationship: Memorial Sloan Kettering Cancer Center

No other potential conflicts of interest were reported.

Figures

FIG 1.
FIG 1.
The framework architecture used to identify the cohort, extract data, and create a synthetic clinical trial. The Cohort Builder program applied user-selected inclusion and exclusion criteria to structured data in cancer registries to identify a base cohort via a Structured Query Language (SQL) query. Data extraction modules in the Worker Application program then identified and collected treatment classification, functional status, and smoking status for the base cohort. The Result Integration module then combined the data from the data extraction modules into a comma-delimited file for analysis.
FIG 2.
FIG 2.
The Cohort Builder program consisted of a user-friendly, Web-based interface. Users could select a group of patients on the basis of age at diagnosis, sex, disease site, tumor histology, disease stage, and treatment start and end dates. Each patient identified by the Cohort Builder was listed in a row in a database subsequently filled in with additional information from specific modules.
FIG 3.
FIG 3.
The Treatment Classification module categorized patients into groups on the basis of treatment and regimen adherence. Each step in the group identification process is shown. Each vertical bar represents a drug administration. Thick bars represent administration of drugs essential to the particular treatment regimen of interest. Some doses of treatment occurred exactly within the anticipated time frame (as indicated in red). However, some doses of treatment can occur outside the perfect time frame but still be within the allowed deviations for a particular analysis. These permitted deviations are indicated in green. Ultimately, the various drug administrations were grouped into cycles of treatment (indicated by blue underline).
FIG 4.
FIG 4.
Overall survival plots for the synthetic clinical trial and randomized clinical trial (RCT). (A) The Kaplan-Meier plot of overall survival by treatment arm for the synthetic clinical trial calculated from patient diagnosis to death or censor date (30 months). (B) The Kaplan-Meier plot of overall survival by treatment arm in the RCT. Reprinted with permission. CP, cisplatin/pemetrexed; CG, cisplatin-gemcitabine.
FIG A1.
FIG A1.
Flow of patient inclusion and exclusion for the synthetic clinical trial. User-defined selections identified a base cohort determined by disease type, disease stage, and treatment. Data extraction modules then closely examined each patient’s treatment regimen to determine whether the regimen matched specified criteria, in this case platinum/pemetrexed or platinum/gemcitabine. MSK, Memorial Sloan Kettering Cancer Center; ECOG, Eastern Cooperative Oncology Group.

References

    1. Dreyer NA. Advancing a framework for regulatory use of real-world evidence: When real is reliable. Ther Innov Regul Sci. 2018;52:362–368. - PMC - PubMed
    1. Franklin JM, Schneeweiss S. When and how can real world data analyses substitute for randomized controlled trials? Clin Pharmacol Ther. 2017;102:924–933. - PubMed
    1. Khosla S, White R, Medina J, et al. Real world evidence (RWE) - A disruptive innovation or the quiet evolution of medical evidence generation? F1000 Res. 2018;7:111. - PMC - PubMed
    1. Genestreti G, Giovannini N, Frizziero M, et al. Carboplatin and gemcitabine in first-line treatment of elderly patients with advanced non-small cell lung cancer: Data from a retrospective study. J Chemother. 2011;23:232–237. - PubMed
    1. Samelis GF, Ekmektzoglou KA, Tsiakou A, et al. Survival benefit during zoledronic acid and docetaxel-based chemotherapy in metastatic hormone-refractory prostate cancer patients: An institutional report. J BUON. 2011;16:738–743. - PubMed

Publication types