Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 17:11:1-15.
doi: 10.2147/CLEP.S181242. eCollection 2019.

Measuring prevalence and incidence of chronic conditions in claims and electronic health record databases

Affiliations

Measuring prevalence and incidence of chronic conditions in claims and electronic health record databases

Jeremy A Rassen et al. Clin Epidemiol. .

Abstract

Background: Health care databases are natural sources for estimating prevalence and incidence of chronic conditions, but substantial variation in estimates limits their interpretability and utility. We evaluated the effects of design choices when estimating prevalence and incidence in claims and electronic health record databases.

Methods: Prevalence and incidence for five chronic diseases at increasing levels of expected frequencies, from cystic fibrosis to COPD, were estimated in the Clinical Practice Research Datalink (CPRD) and MarketScan databases from 2011 to 2014. Estimates were compared using different definitions of lookback time and contributed person-time.

Results: Variation in lookback time substantially affected estimates. In 2014, for CPRD, use of an all-time vs a 1-year lookback window resulted in 4.3-8.3 times higher prevalence (depending on disease), reducing incidence by 1.9-3.3 times. All-time lookback resulted in strong temporal trends. COPD prevalence between 2011 and 2014 in MarketScan increased by 25% with an all-time lookback but stayed relatively constant with a 1-year lookback. Varying observability did not substantially affect estimates.

Conclusion: This framework draws attention to the underrecognized potential for widely varying incidence and prevalence estimates, with implications for care planning and drug development. Though prevalence and incidence are seemingly straightforward concepts, careful consideration of methodology is required to obtain meaningful estimates from health care databases.

Keywords: cross-sectional studies; epidemiologic methods; epidemiological monitoring; epidemiology; incidence; pharmacoepidemiology; prevalence; prevalence studies; secondary databases; sentinel surveillance.

PubMed Disclaimer

Conflict of interest statement

Disclosure Jeremy A Rassen is an employee of and has an ownership interest in Aetion, Inc, a technology company that provides analytic software and services to the health care industry. Dorothee B Bartels is an employee of Boehringer Ingelheim, which is a customer of Aetion, Inc. Sebastian Schneeweiss is a consultant to World Health Information Science Consultants (WHISCON), LLC, and to Aetion, Inc, in which he also owns equity. He is the principal investigator of investigator-initiated grants to the Brigham and Women’s Hospital from Bayer, Genentech, and Boehringer Ingelheim. Amanda R Patrick is an employee of and has ownership in Aetion, Inc. At the time of writing, William Murk was an employee of and had ownership in Aetion, Inc, in which he has an ownership interest. The authors report no other conflicts of interest in this work.

Figures

Figure 1
Figure 1
Definitions of observable person-time, in claims data (A) or EHR data (B). Notes: As shown by a hypothetical patient in (A), claims data observability can be based on the entire time enrolled in a health plan (OC1) or enrolled time that excludes time known to be structurally nonobservable, eg, time during nursing home care (OC2). As shown for a different hypothetical patient in (B), EHR observability can be based on all calendar time, irrespective of event data, assuming that all patient encounters would be captured in that EHR system (OE1), time between the start of the first event and the end of the last event recorded in the EHR system (OE2), or time defined by a “buffer” around each encounter, excluding the time where there is a gap of a certain duration (eg, >365 days) between margins (OE3). Abbreviations: EHR, electronic health record; OC, observable person-time in claims databases; OE, observable person-time in EHR databases.
Figure 2
Figure 2
The period of interest and the lookback time. Notes: Timeline showing 2012 through the end of 2014, with 2014 being the POI and two different LB times, LB1 (all time) and LB2 (1 year fixed time). The observable times of three different hypothetical patients are shown. Abbreviations: LB time, lookback; POI, period of interest.
Figure 3
Figure 3
The prevalence numerator. Notes: PN1 is the point PN, ie, the condition is present as of the first day of the period of interest; and PN2 is the period PN, ie, the condition is either present as of the first day of the POI, or is recorded for the first time during the POI. These are shown for three hypothetical patients. In (A), all time LB (LB1) is used to define the LB time. In (B), fixed-time LB (LB2) is used to define the LB time. The columns at the right indicate the numbers each patient contributes to the respective numerator value, as well as the total numerator values. Abbreviations: LB, lookback time; PN, prevalence numerator; POI, period of interest.
Figure 4
Figure 4
The prevalence denominator results. Notes: Day 1 population (PD1), complete-period population (PD2), and sufficient-time population (PD4) are shown for three hypothetical patients. For PD4, assume that the requirement is defined as having at least n=90 observable person-days in the period of interest (POI). The columns at the right indicate the numbers each patient contributes to the respective denominator value, as well as the total denominator values. Abbreviations: LB, lookback time; LO, lookback observability; PD, prevalence denominator; POI, period of interest.
Figure 5
Figure 5
The incidence numerator and denominator. Notes: Five hypothetical patients are shown. The columns at right indicate the numbers each patient contributes to the respective numerator or denominator value, as well as the total values, assuming options IPD1 or IRD are used. Dashes (“–”) indicate that the patient was excluded from contributing to the respective value. Abbreviations: IN, incidence numerator; IPD, incidence proportion denominator; IRD, incidence rate denominator.
Figure 6
Figure 6
Incidence and parameter estimates, by case examples, in CPRD. Notes: Cumulative incidence (“incidence”) and point prevalence (“prevalence”) estimates are shown for each of the four cases, across multiple diseases. Case 1: 1-year lookback; Day 1 population. Case 2: 2-year lookback; Day 1 population. Case 3: all-time lookback; Day 1 population. Case 4: all-time lookback; complete-time population. Shaded regions indicate 95% confidence intervals. Tabular estimates are provided in Table S4. Abbreviation: CRPD, Clinical Practice Research Datalink.
Figure 7
Figure 7
Incidence and parameter estimates, by case examples, in MarketScan. Notes: Cumulative incidence (“incidence”) and point prevalence (“prevalence”) estimates are shown for each of the four cases, across multiple diseases. Case 1: 1-year lookback; Day 1 population. Case 2: 2-year lookback; Day 1 population. Case 3: all-time lookback; Day 1 population. Case 4: all-time lookback; complete-time population. Shaded regions indicate 95% confidence intervals. Tabular estimates are provided in Table S5.

Similar articles

Cited by

References

    1. Rothman KJ, Greenland S1, Lash TL. Modern Epidemiology. Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins; 2008.
    1. Kweon S, Kim Y, Jang MJ, et al. Data resource profile: the Korea National Health and Nutrition Examination Survey (KNHANES) Int J Epidemiol. 2014;43(1):69–77. - PMC - PubMed
    1. Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005;58(4):323–337. - PubMed
    1. Franklin JM, Schneeweiss S. When and How Can Real World Data Analyses Substitute for Randomized Controlled Trials? Clin Pharmacol Ther. 2017;102(6):924–933. - PubMed
    1. Lin KJ, Schneeweiss S. Considerations for the analysis of longitudinal electronic health records linked to claims data to study the effectiveness and safety of drugs. Clin Pharmacol Ther. 2016;100(2):147–159. - PubMed