Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug;55(8):e59-e67.
doi: 10.1097/MLR.0000000000000324.

Validation of Diagnostic Groups Based on Health Care Utilization Data Should Adjust for Sampling Strategy

Affiliations

Validation of Diagnostic Groups Based on Health Care Utilization Data Should Adjust for Sampling Strategy

Geneviève Cadieux et al. Med Care. 2017 Aug.

Abstract

Objective: Valid measurement of outcomes such as disease prevalence using health care utilization data is fundamental to the implementation of a "learning health system." Definitions of such outcomes can be complex, based on multiple diagnostic codes. The literature on validating such data demonstrates a lack of awareness of the need for a stratified sampling design and corresponding statistical methods. We propose a method for validating the measurement of diagnostic groups that have: (1) different prevalences of diagnostic codes within the group; and (2) low prevalence.

Methods: We describe an estimation method whereby: (1) low-prevalence diagnostic codes are oversampled, and the positive predictive value (PPV) of the diagnostic group is estimated as a weighted average of the PPV of each diagnostic code; and (2) claims that fall within a low-prevalence diagnostic group are oversampled relative to claims that are not, and bias-adjusted estimators of sensitivity and specificity are generated.

Application: We illustrate our proposed method using an example from population health surveillance in which diagnostic groups are applied to physician claims to identify cases of acute respiratory illness.

Conclusions: Failure to account for the prevalence of each diagnostic code within a diagnostic group leads to the underestimation of the PPV, because low-prevalence diagnostic codes are more likely to be false positives. Failure to adjust for oversampling of claims that fall within the low-prevalence diagnostic group relative to those that do not leads to the overestimation of sensitivity and underestimation of specificity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Similar articles

Cited by

References

    1. National Research Council. The Learning Healthcare System: Workshop Summary (IOM Roundtable on Evidence-Based Medicine). 2007Washington, DC: The National Academies Press. - PubMed
    1. National Research Council. Digital Infrastructure for the Learning Health System. 2011Washington, DC: The National Academies Press. - PubMed
    1. National Research Council. Digital Data Improvement Priorities for Continuous Learning in Health and Health Care: Workshop Summary. 2013Washington, DC: The National Academies Press. - PubMed
    1. Aguilar-Gaxiola S, Ahmed S, Franco Z, et al. Towards a unified taxonomy of health indicators: academic health centers and communities working together to improve population health. Acad Med. 2014;89:564–572. - PMC - PubMed
    1. Blumenthal D, Tavenner M. The “meaningful use” regulation for electronic health records. N Engl J Med. 2010;363:501–504. - PubMed

Publication types

Grants and funding