Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data
- PMID: 33566082
- PMCID: PMC7928835
- DOI: 10.1093/jamia/ocab018
Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data
Abstract
Objective: The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity.
Materials and methods: Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site.
Results: The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability-up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review.
Discussion: We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions.
Conclusions: We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.
Keywords: computable phenotype; data interoperability; data networking; disease severity; medical informatics; novel coronavirus.
© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Publication types
MeSH terms
Grants and funding
- R01 LM013345/LM/NLM NIH HHS/United States
- R01 NS098023/NS/NINDS NIH HHS/United States
- UL1 TR001422/TR/NCATS NIH HHS/United States
- K23-HL148394/HL/NHLBI NIH HHS/United States
- UL1 TR001420/TR/NCATS NIH HHS/United States
- L40 HL148910/HL/NHLBI NIH HHS/United States
- R01 LM012095/LM/NLM NIH HHS/United States
- T32 LM012203/LM/NLM NIH HHS/United States
- FKZ 01ZZ1801B/German Federal Ministry of Education and Research
- R01NS098023/NS/NINDS NIH HHS/United States
- 5R01HG009174-04/NIH National Human Genome Research Institute
- U01 TR002623/TR/NCATS NIH HHS/United States
- U01TR002623/NIH National Center for Advancing Translational Sciences
- K23 HL148394/HL/NHLBI NIH HHS/United States
- NH/NIH HHS/United States
- K12 HD047349/HD/NICHD NIH HHS/United States
LinkOut - more resources
Full Text Sources
Medical