A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital

Lijing Wang¹, Amy R Zipursky², Alon Geva³, Andrew J McMurry⁴, Kenneth D Mandl⁴, Timothy A Miller⁴

Affiliations

¹ Department of Data Science, New Jersey Institute of Technology, Newark, New Jersey, USA.
² Computational Health Informatics Program and Department of Emergency Medicine, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.
³ Computational Health Informatics Program and Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
⁴ Computational Health Informatics Program, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.

PMID: 37425487
PMCID: PMC10322650
DOI: 10.1093/jamiaopen/ooad047

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital

Lijing Wang et al. JAMIA Open. 2023.

. 2023 Jul 5;6(3):ooad047.

doi: 10.1093/jamiaopen/ooad047. eCollection 2023 Oct.

Authors

Lijing Wang¹, Amy R Zipursky², Alon Geva³, Andrew J McMurry⁴, Kenneth D Mandl⁴, Timothy A Miller⁴

Affiliations

¹ Department of Data Science, New Jersey Institute of Technology, Newark, New Jersey, USA.
² Computational Health Informatics Program and Department of Emergency Medicine, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.
³ Computational Health Informatics Program and Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
⁴ Computational Health Informatics Program, Boston Children's Hospital, Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA.

PMID: 37425487
PMCID: PMC10322650
DOI: 10.1093/jamiaopen/ooad047

Abstract

Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).

Materials and methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier.

Results: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19.

Discussion: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned.

Conclusion: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.

Keywords: COVID-19; machine learning; natural language processing; text classification.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
Monthly COVID-19 case count detected by SARS-CoV2 PCR in the structured data or by NLP are represented as bars, with counts corresponding to the left y-axis. The line represents the percent increase in cases afforded by use of the classifier during that month, corresponding to the right y-axis. The spike from October through December 2020 likely represents the Delta variant, while the spike starting in December 2021 likely represents the Omicron variant.

See this image and copyright information in PMC

Update of

A computable phenotype for patients with SARS-CoV2 testing that occurred outside the hospital.
Wang L, Zipursky A, Geva A, McMurry AJ, Mandl KD, Miller TA. Wang L, et al. medRxiv [Preprint]. 2023 Jan 19:2023.01.19.23284738. doi: 10.1101/2023.01.19.23284738. medRxiv. 2023. Update in: JAMIA Open. 2023 Jul 05;6(3):ooad047. doi: 10.1093/jamiaopen/ooad047. PMID: 36711461 Free PMC article. Updated. Preprint.

References

1. Afshar M, Joyce C, Oakey A, et al.A computable phenotype for acute respiratory distress syndrome using natural language processing and machine learning. AMIA Annu Symp Proc 2018; 2018: 157–65. - PMC - PubMed
1. Geva A, Gronsbell JL, Cai T, et al.; Pediatric Pulmonary Hypertension Network and National Heart, Lung, and Blood Institute Pediatric Pulmonary Vascular Disease Outcomes Bioinformatics Clinical Coordinating Center Investigators. A computable phenotype improves cohort ascertainment in a pediatric pulmonary hypertension registry. J Pediatr 2017; 188: 224–31.e5. - PMC - PubMed
1. Pacheco JA, Rasmussen LV, Kiefer RC, et al.A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments. J Am Med Inform Assoc 2018; 125 (11): 1540–6. - PMC - PubMed
1. Wang SI, Manning CD. Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 2012: 90–4; Jeju Island, South Korea.
1. Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); 2014: 1746–51; Doha, Qatar. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital

Affiliations

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous