Validation of Diagnostic Groups Based on Health Care Utilization Data Should Adjust for Sampling Strategy
- PMID: 25821898
- PMCID: PMC5510703
- DOI: 10.1097/MLR.0000000000000324
Validation of Diagnostic Groups Based on Health Care Utilization Data Should Adjust for Sampling Strategy
Abstract
Objective: Valid measurement of outcomes such as disease prevalence using health care utilization data is fundamental to the implementation of a "learning health system." Definitions of such outcomes can be complex, based on multiple diagnostic codes. The literature on validating such data demonstrates a lack of awareness of the need for a stratified sampling design and corresponding statistical methods. We propose a method for validating the measurement of diagnostic groups that have: (1) different prevalences of diagnostic codes within the group; and (2) low prevalence.
Methods: We describe an estimation method whereby: (1) low-prevalence diagnostic codes are oversampled, and the positive predictive value (PPV) of the diagnostic group is estimated as a weighted average of the PPV of each diagnostic code; and (2) claims that fall within a low-prevalence diagnostic group are oversampled relative to claims that are not, and bias-adjusted estimators of sensitivity and specificity are generated.
Application: We illustrate our proposed method using an example from population health surveillance in which diagnostic groups are applied to physician claims to identify cases of acute respiratory illness.
Conclusions: Failure to account for the prevalence of each diagnostic code within a diagnostic group leads to the underestimation of the PPV, because low-prevalence diagnostic codes are more likely to be false positives. Failure to adjust for oversampling of claims that fall within the low-prevalence diagnostic group relative to those that do not leads to the overestimation of sensitivity and underestimation of specificity.
Conflict of interest statement
The authors declare no conflict of interest.
Similar articles
-
Accuracy of syndrome definitions based on diagnoses in physician claims.BMC Public Health. 2011 Jan 7;11:17. doi: 10.1186/1471-2458-11-17. BMC Public Health. 2011. PMID: 21211054 Free PMC article.
-
Comparison of electronic laboratory reports, administrative claims, and electronic health record data for acute viral hepatitis surveillance.J Public Health Manag Pract. 2012 May-Jun;18(3):209-14. doi: 10.1097/PHH.0b013e31821f2d73. J Public Health Manag Pract. 2012. PMID: 22473112
-
Comparing breast cancer case identification using HMO computerized diagnostic data and SEER data.Am J Manag Care. 2004 Apr;10(4):257-62. Am J Manag Care. 2004. PMID: 15124502
-
Validating the diagnosis of acute ischemic stroke in a National Health Insurance claims database.J Formos Med Assoc. 2015 Mar;114(3):254-9. doi: 10.1016/j.jfma.2013.09.009. Epub 2013 Oct 18. J Formos Med Assoc. 2015. PMID: 24140108
-
Methods for systematic reviews of administrative database studies capturing health outcomes of interest.Vaccine. 2013 Dec 30;31 Suppl 10:K2-6. doi: 10.1016/j.vaccine.2013.06.048. Vaccine. 2013. PMID: 24331070 Review.
Cited by
-
Validating pertussis data measures using electronic medical record data in Ontario, Canada 1986-2016.Vaccine X. 2023 Nov 21;15:100408. doi: 10.1016/j.jvacx.2023.100408. eCollection 2023 Dec. Vaccine X. 2023. PMID: 38161988 Free PMC article.
-
Systematic review of the accuracy of antibody tests used to screen asymptomatic adults for hepatitis C infection.CMAJ Open. 2016 Dec 2;4(4):E737-E745. doi: 10.9778/cmajo.20160084. eCollection 2016 Oct-Dec. CMAJ Open. 2016. PMID: 28018889 Free PMC article.
-
Validation of a software application using electronic health records for automatic detection of community onset sepsis.Sci Rep. 2025 May 12;15(1):16412. doi: 10.1038/s41598-025-99879-9. Sci Rep. 2025. PMID: 40355546 Free PMC article.
References
-
- National Research Council. The Learning Healthcare System: Workshop Summary (IOM Roundtable on Evidence-Based Medicine). 2007Washington, DC: The National Academies Press. - PubMed
-
- National Research Council. Digital Infrastructure for the Learning Health System. 2011Washington, DC: The National Academies Press. - PubMed
-
- National Research Council. Digital Data Improvement Priorities for Continuous Learning in Health and Health Care: Workshop Summary. 2013Washington, DC: The National Academies Press. - PubMed
-
- Blumenthal D, Tavenner M. The “meaningful use” regulation for electronic health records. N Engl J Med. 2010;363:501–504. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources