Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 7;14(1):1948.
doi: 10.1038/s41467-023-37653-z.

Data-driven analysis to understand long COVID using electronic health records from the RECOVER initiative

Affiliations

Data-driven analysis to understand long COVID using electronic health records from the RECOVER initiative

Chengxi Zang et al. Nat Commun. .

Abstract

Recent studies have investigated post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) using real-world patient data such as electronic health records (EHR). Prior studies have typically been conducted on patient cohorts with specific patient populations which makes their generalizability unclear. This study aims to characterize PASC using the EHR data warehouses from two large Patient-Centered Clinical Research Networks (PCORnet), INSIGHT and OneFlorida+, which include 11 million patients in New York City (NYC) area and 16.8 million patients in Florida respectively. With a high-throughput screening pipeline based on propensity score and inverse probability of treatment weighting, we identified a broad list of diagnoses and medications which exhibited significantly higher incidence risk for patients 30-180 days after the laboratory-confirmed SARS-CoV-2 infection compared to non-infected patients. We identified more PASC diagnoses in NYC than in Florida regarding our screening criteria, and conditions including dementia, hair loss, pressure ulcers, pulmonary fibrosis, dyspnea, pulmonary embolism, chest pain, abnormal heartbeat, malaise, and fatigue, were replicated across both cohorts. Our analyses highlight potentially heterogeneous risks of PASC in different populations.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overall data-driven high-throughput screening framework.
a Selection of patients from the INSIGHT and OneFlorida+ EHR warehouses, March 2020 to November 2021. b High-throughput construction of PASC-specific case and control groups that patients did not have target condition at baseline. c Study design. The PASC outcomes were ascertained from day 30 after the SARS-CoV-2 infection and the adjusted risk was computed 180 days after the SARS-CoV-2 infection. d Adjustment for baseline covariates by using stabilized inverse probability of treatment weighting (IPTW). e Likely PASC conditions were identified in the INSIGHT and OneFlorida+ cohorts respectively. Identified PASC were compared between the two cohorts. EHR electronic health records, PASC post-acute sequelae of SARS-CoV-2 infection.
Fig. 2
Fig. 2. Identified potential incident PASC conditions from the INSIGHT cohort and the OneFlorida+ cohort, March 2020 to November 2021.
a The risk of incident diagnoses from INSIGHT. b The risk of incident medications from INSIGHT. c The risk of incident diagnoses from OneFlorida+. d The risk of incident medications from OneFlorida+. The incident risk was quantified by the adjusted hazard ratios with 95% confidence intervals, and we also reported the adjusted cumulative incidences per 1000 patients in both the SARS-CoV-2 positive and the negative groups. The sequelae outcomes were ascertained from day 30 after the SARS-CoV-2 infection and computed 180 days after the SARS-CoV-2 infection. PASC conditions were selected based on adjusted hazard ratio > 1, the aHR’s P-value <8.39 × 10−5 (the Bonferroni-corrected significance threshold for multiple comparisons), and at least 100 identified cases in the positive group. The aHR and its P-value were calculated by the Cox proportional hazard model and the Wald Chi-Square test. The colors represent different organ systems. The replicated diagnoses and medications in the OneFlorida+ were marked by ‡ symbols. aHR adjusted hazard ratio, CI confidence interval, CIF adjusted cumulative incidence function, COPD Chronic obstructive pulmonary disease. The PASC diagnosis code U099/B948 was also replicated in both cohorts but not illustrated in a or c, with aHRs 42.7 (95% CI, 28.9–63.2) and 39.8 (95% CI, 26.8–59.0) for INSIGHT and OneFlorida+, respectively.
Fig. 3
Fig. 3. Comparison of the PASC risks in the INSIGHT cohort versus in the OneFlorida+ cohort, from March 2020 to November 2021.
The incident risk was measured by the adjusted hazard ratios (aHR) with 95% confidence intervals as shown in the main panel. The adjusted cumulative incidences (CIF) per 1000 patients in both the SARS-CoV-2 positive group and the negative group were also reported. The PASC conditions identified in both datasets were marked by ‡ symbols. The color panels represent different organ systems, including (from top to bottom): the nervous system, skin, respiratory system, circulatory system, endocrine and metabolic, digestive system, genitourinary system, and other signs. The PASC outcomes were ascertained from day 30 after the SARS-CoV-2 infection and all the adjusted risk measures were computed 180 days after the SARS-CoV-2 infection. The aHRs of PASC diagnosis code U099/B948 were not illustrated here. COPD, Chronic obstructive pulmonary disease. PASC, post-acute sequelae of SARS-CoV-2 infection.
Fig. 4
Fig. 4. Stratified analysis of adjusted excess burden of post-acute sequelae of SARS-CoV-2 infection (PASC) over different subgroups, the INSIGHT cohort, from March 2020 to November 2021.
The adjusted excess burden is measured by the difference in the adjusted cumulative incidence per 1000 between two exposure subgroups. Subgroups were stratified by their acute severity status, age groups, gender, race groups, and baseline pre-existing conditions. Different color panels represent different organ systems, including (from top to bottom): the nervous system, skin, respiratory system, circulatory system, blood-forming organs, endocrine and metabolic, digestive system, genitourinary system, and general signs. CAD coronary artery disease, CKD chronic kidney disease, CPD chronic pulmonary disease, T2D diabetes type 2, Healthy: no documented pre-existing conditions and no PASC-like symptoms at baseline. Two ICD-10 diagnosis codes B948 (sequelae of other specified infectious and parasitic diseases) and U099 (post-COVID-19 condition, unspecified) were also used to compare general post-acute sequelae of SARS-CoV-2 infection in different groups. The conditions with their aHRs’ P-value < 8.39 × 10−5 (the Bonferroni-corrected significance threshold) were highlighted in red squares. The PASC conditions also identified in OneFlorida+ were marked by ‡ symbols. The fraction of the subgroup population was shown at the top.

References

    1. WHO. Coronavirus Disease (COVID-19). https://www.who.int/emergencies/diseases/novel-coronavirus-2019 (2023).
    1. Nalbandian A, et al. Post-acute COVID-19 syndrome. Nat. Med. 2021;27:601–615. doi: 10.1038/s41591-021-01283-z. - DOI - PMC - PubMed
    1. Dixit NM, Churchill A, Nsair A, Hsu JJ. Post-Acute COVID-19 syndrome and the cardiovascular system: what is known? Am. Heart J. Cardiol. Res. Pract. 2021;5:100025. doi: 10.1016/j.ahjo.2021.100025. - DOI - PMC - PubMed
    1. Effiong, A. Post-acute sequelae of COVID-19 and adverse psychiatric outcomes: an etiology and risk systematic review protocol. medrXiv10.1101/2022.02.07.22270646 (2022). - PMC - PubMed
    1. Moghimi N, et al. The neurological manifestations of post-acute sequelae of SARS-CoV-2 infection. Curr. Neurol. Neurosci. Rep. 2021;21:44. doi: 10.1007/s11910-021-01130-1. - DOI - PMC - PubMed

Publication types