Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul;4(7):e542-e557.
doi: 10.1016/S2589-7500(22)00091-7. Epub 2022 Jun 9.

COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records

Collaborators, Affiliations

COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records

Johan H Thygesen et al. Lancet Digit Health. 2022 Jul.

Abstract

Background: Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework.

Methods: In this cohort study, we used eight linked National Health Service (NHS) datasets for people in England alive on Jan 23, 2020. Data on COVID-19 testing, vaccination, primary and secondary care records, and death registrations were collected until Nov 30, 2021. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity and encompassing five categories: positive SARS-CoV-2 test, primary care diagnosis, hospital admission, ventilation modality (four phenotypes), and death (three phenotypes). We constructed patient trajectories illustrating transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status.

Findings: Among 57 032 174 individuals included in the cohort, 13 990 423 COVID-19 events were identified in 7 244 925 individuals, equating to an infection rate of 12·7% during the study period. Of 7 244 925 individuals, 460 737 (6·4%) were admitted to hospital and 158 020 (2·2%) died. Of 460 737 individuals who were admitted to hospital, 48 847 (10·6%) were admitted to the intensive care unit (ICU), 69 090 (15·0%) received non-invasive ventilation, and 25 928 (5·6%) received invasive ventilation. Among 384 135 patients who were admitted to hospital but did not require ventilation, mortality was higher in wave 1 (23 485 [30·4%] of 77 202 patients) than wave 2 (44 220 [23·1%] of 191 528 patients), but remained unchanged for patients admitted to the ICU. Mortality was highest among patients who received ventilatory support outside of the ICU in wave 1 (2569 [50·7%] of 5063 patients). 15 486 (9·8%) of 158 020 COVID-19-related deaths occurred within 28 days of the first COVID-19 event without a COVID-19 diagnoses on the death certificate. 10 884 (6·9%) of 158 020 deaths were identified exclusively from mortality data with no previous COVID-19 phenotype recorded. We observed longer patient trajectories in wave 2 than wave 1.

Interpretation: Our analyses illustrate the wide spectrum of disease trajectories as shown by differences in incidence, survival, and clinical pathways. We have provided a modular analytical framework that can be used to monitor the impact of the pandemic and generate evidence of clinical and policy relevance using multiple EHR sources.

Funding: British Heart Foundation Data Science Centre, led by Health Data Research UK.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests AB reports grants from the National Institute for Health Research (NIHR), British Medical Association, AstraZeneca, and UK Research and Innovation, outside the submitted work. BAM is an employee of the Wellcome Trust and reports grants from Health Data Research UK (HDR UK), UK Medical Research Council (MRC), and Diabetes UK. SH works as a data scientist and data curator for NHS Digital, which holds and processes the data. MAM is supported by research funding from AstraZeneca, outside the submitted work. AH is employed by Institute of Health Informatics, University College London. CS reports grants from the Wellcome Trust, MRC, HDR UK, University of Edinburgh, UK Research and Innovation (UKRI), and the BHF, outside the submitted work; participates on the data safety monitoring board for TARDIS; and has leadership or fiduciary roles with Cancer Research UK Early Detection and Diagnosis Research Committee, UKRI Expert Review Panel for Longitudinal Health & Wellbeing National Core Study, NIHR/UKRI Long COVID call Funding Review Panel, Accelerated Access Collaborative/NIHR/NHSX Artificial Intelligence (AI) in Healthcare Awards Funding Panel, Wellcome Trust Biomedical Resources Award Funding Panel, MRC strategic review Advisory Group for Maximising the opportunities from data science for innovative biomedical research, MRC Data Science Strategy Advisory Group, UKRI Digital Health Research and Innovation Strategy Expert Group, MRC Strategic Review of Units and Centres Main Panel & Population Heath Panel, Wellcome Trust Science Funding Review External advisory group, MRC Methodology Research (Better Methods Better Research) Panel, REF 2021 Subpanel - Public Health, Health Services and Primary Care, Longitudinal Health & Wellbeing COVID-19 National Core Study Strategic Advisory Board, UK Government Clinical Research Recovery Resilience and Growth programme Clinical Trials Expert Group, UK Government Scientific Advisory Group & SAGE Task and Finish Advisory Group on mass population testing for COVID-19, Scottish Government Covid-19 Data Taskforce, BHF Data Science Centre Steering Group, Our Future Health Scientific Advisory Board, Imperial College UKRI Centre for Doctoral Training in AI for Healthcare External advisory board, Swansea University UKRI Centre for Doctoral Training in AI, Machine Learning & Advanced Computing External advisory board, HDR UK Science Strategy Board / Science and Infrastructure Delivery Group, University of Bristol MRC Integrative Epidemiology Unit Scientific Advisory Board, International evaluation panel for Danish National Biobank, H2020 IMI ROADMAP Steering Committee, and the STAT-PD Steering Committee. NS reports grants from AstraZeneca, Boehringer Ingelheim, Novartis, and Roche Diagnostics; and has received consulting fees from Afimmune, Amgen, AstraZeneca, Boehringer Ingelheim, Eli Lilly, Hanmi Pharmaceuticals, Merck Sharp & Dohme, Novartis, Novo Nordisk, Pfizer, and Sanofi, outside of the submitted work. WNW is supported by a Scottish senior clinical fellowship, Chief Scientist Office (SCAF/17/01), and the Stroke Association (SA CV 20\100018), has received consulting fees from Bayer; payment for expert testimony from UK courts; participates on the data safety monitoring or advisory board for PROTECT-U, CATIS, INTERACT-4, MOSES, and Bayer; has leadership of fiduciary roles with BIASP Scientific Committee; and is associate editor of Stroke. SD has received research funding from GlaxoSmithKline, Astra Zeneca, Bayer, and BenevolentAI. All other authors declare no competing interests

Figures

Figure 1
Figure 1
Framework describing the ten COVID-19 phenotypes, and severity categories, produced using seven linked data sources to evaluate difference between COVID-19 waves and vaccination status For all sources, ontology terms for both suspected and confirmed diagnosis were used. CHESS=COVID-19 Hospitalisations in England Surveillance System. ECMO= extracorporeal membrane oxygenation. ICU=intensive care unit. IMV=invasive mechanical ventilation. GDPPR=General Practice Extraction Service Extract for Pandemic Planning and Research. HES-APC=Hospital Episode Statistics for admitted patient care. HES-CC=Hospital Episode Statistics for adult critical care. NIV=non-invasive ventilation. SGSS=Second Generation Surveillance System. SUS=Secondary Uses Service. *The proportion of individuals with a specific COVID-19 event phenotype, of all individuals with any COVID-19 event phenotype (n=7 244 925). †COVID-19 phenotypes were not mutually exclusive, thus for some phenotypes, the number of events exceeds the number of individuals (eg, individuals could have more than one positive SARS-CoV-2 test). ‡Includes SARS-CoV-2 tests from National Health Service hospitals for individuals with a clinical need and health-care workers and swab testing from the wider population. §HES-CC does not provide data on ECMO treatments.
Figure 2
Figure 2
Cumulative COVID-19 mortality events in wave 1 (A) and wave 2 (B), stratified by most severe COVID-19 phenotype Shaded areas show 95% CIs. Log-rank p< 0·0001, for both plots. Crosses denote censoring. ICU=intensive care unit.
Figure 3
Figure 3
COVID-19 trajectory networks The size of the circles represent the number of individuals with that event relative to the total study population. Numbers on arrows show the proportion of individuals who transitioned to each phenotype (relative to the number of individuals in that COVID-19 wave). Numbers in square brackets show median number of days between events across all individuals with that transition. Median days between individuals who were unaffected (ie, no recorded COVID-19 phenotype) and other severity phenotypes are not shown since they were not directly comparable between waves, due to difference in length of the two periods. Thick arrows represent transitions that occurred in 0·1% of individuals or more. Thin black arrows represent transitions that occurred in 0·01% individuals or more. Any transitions that occurred in fewer than 0·01% of individuals are not shown. All included individuals were alive and had no previous COVID-19 events recorded before the start date of the specified waves.
Figure 4
Figure 4
UpSet plots of individuals with one or more COVID-19 event phenotypes across seven datasets Vertical bars report unique individuals in the intersection denoted by the intersection matrix below. Empty intersections are not shown. Horizontal bars report unique individuals identified from each dataset. Datasets were the SGSS (COVID-19 testing), GDPPR (primary care), HES-APC, HES-CC, SUS, CHESS, and the ONS Civil Registration of Deaths. CHESS=COVID-19 Hospitalisations in England Surveillance System. GDPPR=General Practice Extraction Service Extract for Pandemic Planning and Research. HES-APC=Hospital Episode Statistics for admitted patient care. HES-CC=Hospital Episode Statistics for adult critical care. ONS=Office of National Statistics. SGSS=Second Generation Surveillance System. SUS=Secondary Uses Service.
Figure 5
Figure 5
Kaplan-Meier curve of cumulative COVID-19 events in vaccinated and unvaccinated individuals Cumulative proportion of individuals with a positive SARS-CoV-2 test (A), primary care diagnosis (B), hospital admission (C), ventilatory support (D), and COVID-19 mortality (E) in matched groups of fully vaccinated (two doses administered at least 14 days before Feb 1, 2021) and unvaccinated individuals (no doses administered before or during follow-up). Analysis was run from Feb 1, 2021 to March 15, 2021 and individuals were matched on the basis of sex, 5-year age groups, and ethnicity. Shaded areas show 95% CIs. Log-rank p<0·0001, for all subplots.

References

    1. The OpenSAFELY Collaborative. Walker AJ, MacKenna B, et al. Clinical coding of long COVID in English primary care: a federated analysis of 58 million patient records in situ using OpenSAFELY. medRxiv. 2021 doi: 10.1101/2021.05.06.21256755. published online May 13. (preprint). - DOI - PMC - PubMed
    1. Dagan N, Barda N, Kepten E, et al. BNT162b2 mRNA Covid-19 vaccine in a nationwide mass vaccination setting. N Engl J Med. 2021;384:1412–1423. - PMC - PubMed
    1. Mathur R, Rentsch CT, Morton CE, et al. Ethnic differences in SARS-CoV-2 infection and COVID-19-related hospitalisation, intensive care unit admission, and death in 17 million adults in England: an observational cohort study using the OpenSAFELY platform. Lancet. 2021;397:1711–1724. - PMC - PubMed
    1. Khera R, Mortazavi BJ, Sangha V, et al. Accuracy of computable phenotyping approaches for SARS-CoV-2 infection and COVID-19 hospitalizations from the electronic health record. medRxiv. 2021 doi: 10.1101/2021.03.16.21253770. published online May 13. (preprint). - DOI - PMC - PubMed
    1. Klann JG, Weber GM, Estiri H, et al. Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data. J Am Med Inform Assoc. 2021;28:1411–1420. - PMC - PubMed

Publication types