Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 25;329(16):1376-1385.
doi: 10.1001/jama.2023.4221.

Emulation of Randomized Clinical Trials With Nonrandomized Database Analyses: Results of 32 Clinical Trials

Affiliations

Emulation of Randomized Clinical Trials With Nonrandomized Database Analyses: Results of 32 Clinical Trials

Shirley V Wang et al. JAMA. .

Erratum in

  • Data Errors.
    [No authors listed] [No authors listed] JAMA. 2024 Apr 9;331(14):1236. doi: 10.1001/jama.2024.5022. JAMA. 2024. PMID: 38592399 Free PMC article. No abstract available.

Abstract

Importance: Nonrandomized studies using insurance claims databases can be analyzed to produce real-world evidence on the effectiveness of medical products. Given the lack of baseline randomization and measurement issues, concerns exist about whether such studies produce unbiased treatment effect estimates.

Objective: To emulate the design of 30 completed and 2 ongoing randomized clinical trials (RCTs) of medications with database studies using observational analogues of the RCT design parameters (population, intervention, comparator, outcome, time [PICOT]) and to quantify agreement in RCT-database study pairs.

Design, setting, and participants: New-user cohort studies with propensity score matching using 3 US claims databases (Optum Clinformatics, MarketScan, and Medicare). Inclusion-exclusion criteria for each database study were prespecified to emulate the corresponding RCT. RCTs were explicitly selected based on feasibility, including power, key confounders, and end points more likely to be emulated with real-world data. All 32 protocols were registered on ClinicalTrials.gov before conducting analyses. Emulations were conducted from 2017 through 2022.

Exposures: Therapies for multiple clinical conditions were included.

Main outcomes and measures: Database study emulations focused on the primary outcome of the corresponding RCT. Findings of database studies were compared with RCTs using predefined metrics, including Pearson correlation coefficients and binary metrics based on statistical significance agreement, estimate agreement, and standardized difference.

Results: In these highly selected RCTs, the overall observed agreement between the RCT and the database emulation results was a Pearson correlation of 0.82 (95% CI, 0.64-0.91), with 75% meeting statistical significance, 66% estimate agreement, and 75% standardized difference agreement. In a post hoc analysis limited to 16 RCTs with closer emulation of trial design and measurements, concordance was higher (Pearson r, 0.93; 95% CI, 0.79-0.97; 94% meeting statistical significance, 88% estimate agreement, 88% standardized difference agreement). Weaker concordance occurred among 16 RCTs for which close emulation of certain design elements that define the research question (PICOT) with data from insurance claims was not possible (Pearson r, 0.53; 95% CI, 0.00-0.83; 56% meeting statistical significance, 50% estimate agreement, 69% standardized difference agreement).

Conclusions and relevance: Real-world evidence studies can reach similar conclusions as RCTs when design and measurements can be closely emulated, but this may be difficult to achieve. Concordance in results varied depending on the agreement metric. Emulation differences, chance, and residual confounding can contribute to divergence in results and are difficult to disentangle.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Schneeweiss reported serving as a consultant for Aetion Inc and receiving grants from UCB Pharma and Boehringer Ingelheim. Dr Desai reported receiving grants from Bayer, Novartis, and Vertex. Dr Feldman reported receiving personal fees from Alosa Health, Aetion, and Blue Cross Blue Shield of Massachusetts and serving as an expert witness in litigation against inhaler manufacturers. Dr Glynn reported receiving grants from Amarin, Kowa, Novartis, and Pfizer. Dr Patorno reported receiving grants from the National Institute on Aging, the Patient Centered Outcomes Research Institute, and an investigator grant from Boehringer-Ingelheim to Brigham and Women’s Hospital. Dr Suissa reported serving on the advisory boards of AstraZeneca, Atara, Bristol Myers Squibb, Novartis, and Panalgo and receiving speakers fees from Boehringer-Ingelheim and Novartis and consulting fees from Merck, Pfizer, and Seqirus. Dr Gautam reported owning equity in Aetion. No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Bland-Altman Plot of Agreement in Randomized Clinical Trial–Database Pairs
The difference between the randomized clinical trial (RCT) and database study model coefficients for the effect estimates (eg, log hazard ratio) are plotted against the averaged value for each pair. The 3 blue dashed lines reflect the mean and 95% CIs for the difference in effect estimates for each pair. Each number represents the RCT-database pair listed in Table 1, Table 2, and Figure 2. Black indicates close emulation of the RCT design in exploratory analyses defined in Figure 2; orange, RCT-database pairs with more design emulation differences and not considered close emulations. Some numbers are colored gray for readability. ClinicalTrials.gov NCT registration numbers for RCTs and database studies are provided in eTable 2 of Supplement 1.
Figure 2.
Figure 2.. Emulation Challenges
aGood indicates that the trial had an active comparator; moderate, the placebo emulated by the drug was expected to be unrelated to the outcome and the cohort characteristics were well balanced or the active comparator had to be modified for feasibility reasons; and poor, the placebo emulated by the drug expected to be unrelated to the outcome and expectation of residual confounding from characteristics poorly measured in claims (eg, socioeconomic status). bGood indicates the outcome emulation was assessed with high specificity; moderate, lower outcome specificity or high missingness. cPlacebo, indicates that the run-in was for only the placebo group; both groups, the run-in was for both exposure and comparator in sequence or both groups were run-in on a drug that was neither exposure nor comparator; baseline drugs, run-in for baseline maintenance therapy; 1 class, 1 class of therapy used as either exposure or comparator; and mixed, a mix of therapies according to protocol algorithm. For these trials, the baseline, 1 class, or mixed types of run-ins were expected to selectively include responders to run-in therapy. dA crude measure assessed based on appearance of violation of proportional hazards in published trial figures. eBinary composite indicator was based on the emulation markers listed in this figure (Supplement 1). fDifference in median. gTrials included a postrandomization washout window, therefore did not mix the effects of randomization with discontinuation of baseline therapy. hTrial had coprimary comparisons. The first listed was the primary comparison in the real-world evidence emulation protocol. iTrial was ongoing at the start of the emulation and analysis. Closer emulation of RCT design in the database studies is indicated by blue; emulation means that none of the following characteristics were present and the comparator and outcome emulation were at least moderate with 1 or both classified as good: (1) start of follow-up in the hospital (hospital prescription data were not available in claims but may be available in linked data); (2) run-in type that selectively included responders to 1 treatment group; (3) mixing effect of randomization and discontinuation of baseline maintenance therapy; and (4) delayed effect. Orange indicates RCT-database study pairs with more substantial design emulation differences.

Comment in

References

    1. Friedman LM, Furberg C, DeMets DL. Fundamentals of Clinical Trials. Mosby Year Book; 1996.
    1. US Food and Drug Administration . Framework for FDA's Real World Evidence Program. December 2018. Accessed January 31, 2019. https://www.fda.gov/media/120060/download
    1. Eichler H-G, Pignatti F, Schwarzer-Daum B, et al. . Randomized controlled trials versus real world evidence: neither magic nor myth. Clin Pharmacol Ther. 2021;109(5):1212-1218. doi:10.1002/cpt.2083 - DOI - PMC - PubMed
    1. Suissa S. Reduced mortality with sodium-glucose cotransporter-2 inhibitors in observational studies: avoiding immortal time bias. Circulation. 2018;137(14):1432-1434. doi:10.1161/CIRCULATIONAHA.117.032799 - DOI - PubMed
    1. Retraction—Mehra MR, Desai SS, Ruschitzka F, Patel AN. Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Lancet. 2020;395(10240):1820. doi:10.1016/S0140-6736(20)31324-6 - DOI - PMC - PubMed

Publication types