Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 5;6(3):ooad047.
doi: 10.1093/jamiaopen/ooad047. eCollection 2023 Oct.

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital

Affiliations

A computable case definition for patients with SARS-CoV2 testing that occurred outside the hospital

Lijing Wang et al. JAMIA Open. .

Abstract

Objective: To identify a cohort of COVID-19 cases, including when evidence of virus positivity was only mentioned in the clinical text, not in structured laboratory data in the electronic health record (EHR).

Materials and methods: Statistical classifiers were trained on feature representations derived from unstructured text in patient EHRs. We used a proxy dataset of patients with COVID-19 polymerase chain reaction (PCR) tests for training. We selected a model based on performance on our proxy dataset and applied it to instances without COVID-19 PCR tests. A physician reviewed a sample of these instances to validate the classifier.

Results: On the test split of the proxy dataset, our best classifier obtained 0.56 F1, 0.6 precision, and 0.52 recall scores for SARS-CoV2 positive cases. In an expert validation, the classifier correctly identified 97.6% (81/84) as COVID-19 positive and 97.8% (91/93) as not SARS-CoV2 positive. The classifier labeled an additional 960 cases as not having SARS-CoV2 lab tests in hospital, and only 177 of those cases had the ICD-10 code for COVID-19.

Discussion: Proxy dataset performance may be worse because these instances sometimes include discussion of pending lab tests. The most predictive features are meaningful and interpretable. The type of external test that was performed is rarely mentioned.

Conclusion: COVID-19 cases that had testing done outside of the hospital can be reliably detected from the text in EHRs. Training on a proxy dataset was a suitable method for developing a highly performant classifier without labor-intensive labeling efforts.

Keywords: COVID-19; machine learning; natural language processing; text classification.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Monthly COVID-19 case count detected by SARS-CoV2 PCR in the structured data or by NLP are represented as bars, with counts corresponding to the left y-axis. The line represents the percent increase in cases afforded by use of the classifier during that month, corresponding to the right y-axis. The spike from October through December 2020 likely represents the Delta variant, while the spike starting in December 2021 likely represents the Omicron variant.

Update of

References

    1. Afshar M, Joyce C, Oakey A, et al.A computable phenotype for acute respiratory distress syndrome using natural language processing and machine learning. AMIA Annu Symp Proc 2018; 2018: 157–65. - PMC - PubMed
    1. Geva A, Gronsbell JL, Cai T, et al.; Pediatric Pulmonary Hypertension Network and National Heart, Lung, and Blood Institute Pediatric Pulmonary Vascular Disease Outcomes Bioinformatics Clinical Coordinating Center Investigators. A computable phenotype improves cohort ascertainment in a pediatric pulmonary hypertension registry. J Pediatr 2017; 188: 224–31.e5. - PMC - PubMed
    1. Pacheco JA, Rasmussen LV, Kiefer RC, et al.A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments. J Am Med Inform Assoc 2018; 125 (11): 1540–6. - PMC - PubMed
    1. Wang SI, Manning CD. Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 2012: 90–4; Jeju Island, South Korea.
    1. Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); 2014: 1746–51; Doha, Qatar. - PMC - PubMed

LinkOut - more resources