Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 7:373:n826.
doi: 10.1136/bmj.n826.

Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource

Collaborators, Affiliations

Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: data resource

Angela Wood et al. BMJ. .

Abstract

Objective: To describe a novel England-wide electronic health record (EHR) resource enabling whole population research on covid-19 and cardiovascular disease while ensuring data security and privacy and maintaining public trust.

Design: Data resource comprising linked person level records from national healthcare settings for the English population, accessible within NHS Digital's new trusted research environment.

Setting: EHRs from primary care, hospital episodes, death registry, covid-19 laboratory test results, and community dispensing data, with further enrichment planned from specialist intensive care, cardiovascular, and covid-19 vaccination data.

Participants: 54.4 million people alive on 1 January 2020 and registered with an NHS general practitioner in England.

Main measures of interest: Confirmed and suspected covid-19 diagnoses, exemplar cardiovascular conditions (incident stroke or transient ischaemic attack and incident myocardial infarction) and all cause mortality between 1 January and 31 October 2020.

Results: The linked cohort includes more than 96% of the English population. By combining person level data across national healthcare settings, data on age, sex, and ethnicity are complete for around 95% of the population. Among 53.3 million people with no previous diagnosis of stroke or transient ischaemic attack, 98 721 had a first ever incident stroke or transient ischaemic attack between 1 January and 31 October 2020, of which 30% were recorded only in primary care and 4% only in death registry records. Among 53.2 million people with no previous diagnosis of myocardial infarction, 62 966 had an incident myocardial infarction during follow-up, of which 8% were recorded only in primary care and 12% only in death registry records. A total of 959 470 people had a confirmed or suspected covid-19 diagnosis (714 162 in primary care data, 126 349 in hospital admission records, 776 503 in covid-19 laboratory test data, and 50 504 in death registry records). Although 58% of these were recorded in both primary care and covid-19 laboratory test data, 15% and 18%, respectively, were recorded in only one.

Conclusions: This population-wide resource shows the importance of linking person level data across health settings to maximise completeness of key characteristics and to ascertain cardiovascular events and covid-19 diagnoses. Although this resource was initially established to support research on covid-19 and cardiovascular disease to benefit clinical care and public health and to inform healthcare policy, it can broaden further to enable a wide range of research.

PubMed Disclaimer

Conflict of interest statement

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: support from the funders listed above; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work. SH works as a data scientist and data curator for NHS Digital, which holds and processes the data.

Figures

Fig 1
Fig 1
Overview of current (in bold) and planned data flows into NHS Digital Trusted Research Environment (TRE) for England
Fig 2
Fig 2
Data sources reporting person level data on ethnicity, incident stroke or transient ischaemic attack (TIA), and incident myocardial infarction
Fig 3
Fig 3
Data sources reporting person level data on confirmed or suspected covid-19 diagnoses between 1 January 2020 and 31 October 2020 (n=959 470). Numbers indicate distinct people with a confirmed or suspected covid-19 diagnosis

Comment in

References

    1. Cavallaro F, Lugg-Widger F, Cannings-John R, Harron K. Open Letter: Reducing barriers to data access for research in the public interest—lessons from covid-19. BMJ Opinion 2020. https://blogs.bmj.com/bmj/2020/07/06/reducing-barriers-to-data-access-fo...
    1. Jones KH, Ford DV, Lyons RA. The SAIL Databank: 10 years of spearheading data privacy and research utility, 2007-2017. Swansea University. [cited 2021 Feb 19]. https://saildatabank.com/
    1. McGurnaghan SJ, Weir A, Bishop J, et al. Public Health Scotland COVID-19 Health Protection Study Group. Scottish Diabetes Research Network Epidemiology Group . Risks of and risk factors for COVID-19 disease in people with diabetes: a cohort study of the total population of Scotland. Lancet Diabetes Endocrinol 2021;9:82-93. 10.1016/S2213-8587(20)30405-8 - DOI - PMC - PubMed
    1. Shah ASV, Wood R, Gribben C, et al. . Risk of hospital admission with coronavirus disease 2019 in healthcare workers and their households: nationwide linkage cohort study. BMJ 2020;371:m3582. 10.1136/bmj.m3582 - DOI - PMC - PubMed
    1. Siggaard T, Reguant R, Jørgensen IF, et al. . Disease trajectory browser for exploring temporal, population-wide disease progression patterns in 7.2 million Danish patients. Nat Commun 2020;11:4952. 10.1038/s41467-020-18682-4 - DOI - PMC - PubMed

Publication types

MeSH terms

Substances