Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Oct 23;10(10):e0140533.
doi: 10.1371/journal.pone.0140533. eCollection 2015.

Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from the UK Biobank Stroke Outcomes Group

Affiliations

Accuracy of Electronic Health Record Data for Identifying Stroke Cases in Large-Scale Epidemiological Studies: A Systematic Review from the UK Biobank Stroke Outcomes Group

Rebecca Woodfield et al. PLoS One. .

Abstract

Objective: Long-term follow-up of population-based prospective studies is often achieved through linkages to coded regional or national health care data. Our knowledge of the accuracy of such data is incomplete. To inform methods for identifying stroke cases in UK Biobank (a prospective study of 503,000 UK adults recruited in middle-age), we systematically evaluated the accuracy of these data for stroke and its main pathological types (ischaemic stroke, intracerebral haemorrhage, subarachnoid haemorrhage), determining the optimum codes for case identification.

Methods: We sought studies published from 1990-November 2013, which compared coded data from death certificates, hospital admissions or primary care with a reference standard for stroke or its pathological types. We extracted information on a range of study characteristics and assessed study quality with the Quality Assessment of Diagnostic Studies tool (QUADAS-2). To assess accuracy, we extracted data on positive predictive values (PPV) and-where available-on sensitivity, specificity, and negative predictive values (NPV).

Results: 37 of 39 eligible studies assessed accuracy of International Classification of Diseases (ICD)-coded hospital or death certificate data. They varied widely in their settings, methods, reporting, quality, and in the choice and accuracy of codes. Although PPVs for stroke and its pathological types ranged from 6-97%, appropriately selected, stroke-specific codes (rather than broad cerebrovascular codes) consistently produced PPVs >70%, and in several studies >90%. The few studies with data on sensitivity, specificity and NPV showed higher sensitivity of hospital versus death certificate data for stroke, with specificity and NPV consistently >96%. Few studies assessed either primary care data or combinations of data sources.

Conclusions: Particular stroke-specific codes can yield high PPVs (>90%) for stroke/stroke types. Inclusion of primary care data and combining data sources should improve accuracy in large epidemiological studies, but there is limited published information about these strategies.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. International Classification of Diseases (ICD) codes for cerebrovascular disease.
* 433: occlusion/stenosis of pre-cerebral arteries with or without infarction. † 434: thrombosis/embolism of cerebral arteries with or without infarction. Codes in blue text denote ICD-9 codes which most closely represent stroke when subdivided using additional coding available in the clinically modified version of ICD-9 (ICD-9-CM) used in North America. In ICD-9-CM, ‘with infarction’ (433.x1, 434.x1) is distinguished from ‘without infarction’ (433.x0, 434.x0). ‡ 436: acute, ill-defined cerebrovascular disease ¶ a pathological term for ischaemic stroke § G46: not a diagnostic code; may be used for the presenting symptoms of either stroke or TIA.
Fig 2
Fig 2. Selection of studies.
*Additional studies identified from bibliography screening. †Additional studies identified from review articles and bibliography screening.
Fig 3
Fig 3. Positive predictive values of codes for stroke.
H: hospital data, D: death certificates, H+D: hospital data and death certificates; x = number of coded events confirmed as ‘true cases’ by the reference standard; y = total number of coded events; x/y = PPV. Circles represent PPVs, and horizontal lines denote 95% confidence intervals (CIs). Circle size is proportional to the inverse variance of the PPV. Where more than one result was available for a particular study, the result for the largest number of coded events validated is shown. * Cerebrovascular codes: I60-I69+/-G45 (ICD-10) or 430–438 (ICD-9), unless otherwise specified † Mean PPV (taken from the range published in the study) ‡ Excluding codes 435 (TIA) and 438 (sequelae of cebrovascular disease) § Excluding code 435 (TIA) and including code 342 (hemiplegia and hemiparesis) ¶ Excluding code 435 (TIA) # Stroke-specific codes: 160, 161, 163, 164 (ICD-10), 430, 431, 434, 436 (ICD-9), 430, 431, 433.x1, 434.x1 (ICD-9-CM). ¥ Ischaemic stoke and unspecified stroke codes: I63, I64 (ICD-10), 434, 436 (ICD-9), 433.x1, 434.x1, 436 (ICD-9-CM) **Ischaemic stroke codes:163 (ICD-10), 434 (ICD-9), 433.x1, 434.x1 (ICD-9-CM) †† Haemorrhagic stroke codes:I60, I61 (ICD-10), 430, 431 (ICD-9) ‡‡ Subarachnoid haemorrhage stroke codes:I60 (ICD-10), 430 (ICD-9) ¶¶ Intracerebral haemorrhage stroke codes:I61 (ICD-10), 431 (ICD-9)
Fig 4
Fig 4. Positive predictive values of codes for ischaemic stroke.
H: hospital data, D: death certificates, H+D: hospital data and death certificates; x = number of coded events confirmed as ‘true cases’ by the reference standard; y = total number of coded events; x/y = PPV. Circles represent PPVs, and horizontal lines denote 95% confidence intervals (CIs). Circle size is proportional to the inverse variance of the PPV. Where more than one result was available for a particular study, the result for the largest number of coded events validated is shown. † Mean PPV (taken from the range published in the study) ¥ Ischaemic stoke and unspecified stroke codes: I63, I64 (ICD-10), 434, 436 (ICD-9), 433.x1, 434.x1, 436 (ICD-9-CM) **Ischaemic stroke codes:163 (ICD-10), 434 (ICD-9), 433.x1, 434.x1 (ICD-9-CM)
Fig 5
Fig 5. Positive predictive values of codes for haemorrhagic stroke.
H: hospital data, D: death certificates, H+D: hospital data and death certificates; x = number of coded events confirmed as ‘true cases’ by the reference standard; y = total number of coded events; x/y = PPV. Circles represent PPVs, and horizontal lines denote 95% confidence intervals (CIs). Circle size is proportional to the inverse variance of the PPV. Where more than one result was available for a particular study, the result for the largest number of coded events validated is shown. † Mean PPV (taken from the range published in the study) †† Haemorrhagic stroke codes:I60, I61 (ICD-10), 430, 431 (ICD-9) ‡‡ Subarachnoid haemorrhage stroke codes:I60 (ICD-10), 430 (ICD-9) ¶¶ Intracerebral haemorrhage stroke codes:I61 (ICD-10), 431 (ICD-9)

References

    1. Lozano R, Naghavi M, Foreman K, Lim S, Shibuya K, Aboyans V, et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet 2012; 380: 2095–2128. - PMC - PubMed
    1. O’Donnell M, Xavier D, Liu L, Zhang H, Chin SL, Rao-Melacini P, et al. Risk factors for ischaemic and intracerebral haemorrhagic stroke in 22 countries (the INTERSTROKE study) a case-controls study. Lancet 2010; 376:112–123. 10.1016/S0140-6736(10)60834-3 - DOI - PubMed
    1. Jackson C, Hutchison A, Dennis M, Wardlaw JM, Lindgren A, Norrivng B, et al. Differing risk factor profiles of ischemic stroke subtypes: evidence for a distinct lacunar arteriopathy? Stroke 2010: 41; 624–629. 10.1161/STROKEAHA.109.558809 - DOI - PubMed
    1. Burton P, Hansell A, Fortier I, Manolio TA, Khoury MJ, Little J et al. Size matters: just how big is BIG?:quantifying realistic sample size requirements for human genome epidemiology. Int J Epidemiology 2009;38: 263–273. - PMC - PubMed
    1. Giroud M, Lemesle M, Quantin C, Vourch M, Becker F, Milan C, et al. A hospital-based and a population-based stroke registry yield different results: the experience in Dijon, France. Neuroepidemiology 1997; 16:15–21. - PubMed

Publication types

MeSH terms