Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul;27(7):781-788.
doi: 10.1002/pds.4440. Epub 2018 Apr 17.

Assumptions made when preparing drug exposure data for analysis have an impact on results: An unreported step in pharmacoepidemiology studies

Affiliations

Assumptions made when preparing drug exposure data for analysis have an impact on results: An unreported step in pharmacoepidemiology studies

Stephen R Pye et al. Pharmacoepidemiol Drug Saf. 2018 Jul.

Abstract

Purpose: Real-world data for observational research commonly require formatting and cleaning prior to analysis. Data preparation steps are rarely reported adequately and are likely to vary between research groups. Variation in methodology could potentially affect study outcomes. This study aimed to develop a framework to define and document drug data preparation and to examine the impact of different assumptions on results.

Methods: An algorithm for processing prescription data was developed and tested using data from the Clinical Practice Research Datalink (CPRD). The impact of varying assumptions was examined by estimating the association between 2 exemplar medications (oral hypoglycaemic drugs and glucocorticoids) and cardiovascular events after preparing multiple datasets derived from the same source prescription data. Each dataset was analysed using Cox proportional hazards modelling.

Results: The algorithm included 10 decision nodes and 54 possible unique assumptions. Over 11 000 possible pathways through the algorithm were identified. In both exemplar studies, similar hazard ratios and standard errors were found for the majority of pathways; however, certain assumptions had a greater influence on results. For example, in the hypoglycaemic analysis, choosing a different variable to define prescription end date altered the hazard ratios (95% confidence intervals) from 1.77 (1.56-2.00) to 2.83 (1.59-5.04).

Conclusions: The framework offers a transparent and efficient way to perform and report drug data preparation steps. Assumptions made during data preparation can impact the results of analyses. Improving transparency regarding drug data preparation would increase the repeatability, reproducibility, and comparability of published results.

Keywords: data preparation; pharmacoepidemiology; reproducibility; transparency.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The drug exposure preparation algorithm. qty = total quantity entered by GP for the prescribed product; ndd = derived numeric daily dose; numdays = number of treatment days; dose_duration = derived duration of prescription. The highlighted pathway is the “primary preparation pathway” we defined in the second phase of each analysis; this pathway was used to generate one dataset, then further datasets were generated by varying a single assumption with respect to this primary pathway. All options that produce a missing value stay coded as missing unless otherwise stated. *For options 6d: If only one stop available, use it; if 2 available and equal, use that date; if 2 available and unequal (but within x days), use mean; if 3 available and unequal, use mean of closest 2 if within x days. **Records with missing stop dates after step 7 are dropped
Figure 2
Figure 2
Influence of drug exposure data preparation assumptions on association between oral hypoglycaemic drug class (sulfonylureas compared with biguanides as referent) and CVD events: Distribution of hazard ratios and standard errors from 50 random data preparation pathways
Figure 3
Figure 3
Influence of drug exposure data preparation assumptions on association between oral hypoglycaemic drug class (sulfonylureas compared with biguanides as referent) and CVD events: Effect of changing one data preparation option from primary pathway
Figure 4
Figure 4
Influence of drug exposure data preparation assumptions on association between oral glucocorticoid use (on vs off) and CVD events: Distribution of hazard ratios and standard errors from 50 random data preparation pathways; 3 years of follow‐up
Figure 5
Figure 5
Influence of drug exposure data preparation assumptions on association between oral glucocorticoid use (on vs off) and CVD events: Effect of changing one data preparation option from primary pathway; 3 years of follow‐up
Figure 6
Figure 6
Influence of drug exposure data preparation assumptions on association between oral glucocorticoid use (on vs off) and CVD events: Distribution of hazard ratios and standard errors from 50 random data preparation pathways; 20 years of follow‐up
Figure 7
Figure 7
Influence of drug exposure data preparation assumptions on association between oral glucocorticoid use (on vs off) and CVD events: Effect of changing one data preparation option from primary pathway; 20 years of follow‐up

References

    1. Williams T, van Staa T, Puri S, Eaton S. Recent advances in the utility and use of the general practice research database as an example of a UK primary care data resource. Ther Adv Drug Saf. 2012;3(2):89‐99. 10.1177/2042098611435911 [published Online First: 2012/04/01]. - DOI - PMC - PubMed
    1. von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61(4):344‐349. 10.1016/j.jclinepi.2007.11.008 [published Online First: 2008/03/04]. - DOI - PubMed
    1. Rothman KJ. Modern Epidemiology. 2nd Revised ed. United States of America: Lippincott Williams and Wilkins; 1997.
    1. Benchimol EI, Smeeth L, Guttmann A, et al. The REporting of studies Conducted using Observational Routinely‐collected health Data (RECORD) statement. PLoS Med. 2015;12(10):e1001885 10.1371/journal.pmed.1001885 [published Online First: 2015/10/07]. - DOI - PMC - PubMed
    1. Osborne J. Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data. United States of America: SAGE Publications, Inc.; 2013.

Publication types

MeSH terms