Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 7:9:e58227.
doi: 10.7554/eLife.58227.

Augmented curation of clinical notes from a massive EHR system reveals symptoms of impending COVID-19 diagnosis

Affiliations

Augmented curation of clinical notes from a massive EHR system reveals symptoms of impending COVID-19 diagnosis

Tyler Wagner et al. Elife. .

Abstract

Understanding temporal dynamics of COVID-19 symptoms could provide fine-grained resolution to guide clinical decision-making. Here, we use deep neural networks over an institution-wide platform for the augmented curation of clinical notes from 77,167 patients subjected to COVID-19 PCR testing. By contrasting Electronic Health Record (EHR)-derived symptoms of COVID-19-positive (COVIDpos; n = 2,317) versus COVID-19-negative (COVIDneg; n = 74,850) patients for the week preceding the PCR testing date, we identify anosmia/dysgeusia (27.1-fold), fever/chills (2.6-fold), respiratory difficulty (2.2-fold), cough (2.2-fold), myalgia/arthralgia (2-fold), and diarrhea (1.4-fold) as significantly amplified in COVIDpos over COVIDneg patients. The combination of cough and fever/chills has 4.2-fold amplification in COVIDpos patients during the week prior to PCR testing, in addition to anosmia/dysgeusia, constitutes the earliest EHR-derived signature of COVID-19. This study introduces an Augmented Intelligence platform for the real-time synthesis of institutional biomedical knowledge. The platform holds tremendous potential for scaling up curation throughput, thus enabling EHR-powered early disease diagnosis.

Keywords: COVID-19; SARS-CoV-2; artificial intelligence; electronic health record; human; human biology; infectious disease; machine learning; medicine; microbiology; neural networks.

PubMed Disclaimer

Conflict of interest statement

TW, KM, SA, AV, SB, AP, MK, PA, ML, ZD, ES, HS, AA, RB, VS is an employee of nference and has financial interests in the company. FS, BP, JO, PB, RR, PV, ZT, SR, MM, WW, DC, GG, AW, WM, JH, AB has a Financial Conflict of Interest in technology used in the research and with Mayo Clinic may stand to gain financially from the successful outcome of the research. This research has been reviewed by the Mayo Clinic Conflict of Interest Review Board and is being conducted in compliance with Mayo Clinic Conflict of Interest policies.

Figures

Figure 1.
Figure 1.. Augmented curation of the unstructured clinical notes and comparison of symptoms between COVIDpos vs. COVIDneg patients.
(a) Augmented curation of the unstructured clinical notes from Electronic Health Records (EHRs). (b) COVID-19-related symptom entity recognition, sentiment analysis and grouping of synonyms. (c) Comparison of symptoms extracted from EHR clinical notes of COVIDpos vs. COVIDneg patients.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. SciBERT Architecture and Training Configuration.
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. Examples of Sentence Classification Used in Training a SciBERT Model for Phenotype/Symptom Sentiment Analysis.

References

    1. Alsentzer E, Murphy J, Boag W, Weng WH, Jindi D, Naumann T, McDermott M. Publicly available clinical BERT embeddings. Proceedings of the 2nd Clinical Natural Language Processing Workshop; 2019. pp. 72–78. - DOI
    1. Anand P, Puranik A, Aravamudan M, Venkatakrishnan AJ, Soundararajan V. SARS-CoV-2 strategically mimics proteolytic activation of human ENaC. eLife. 2020;9:e58603. doi: 10.7554/eLife.58603. - DOI - PMC - PubMed
    1. Argenziano MG, Bruce SL, Slater CL, Tiao JR, Baldwin MR, Barr RG, Chang BP, Chau KH, Choi JJ, Gavin N, Goyal P, Mills AM, Patel AA, Romney M-LS, Safford MM, Schluger NW, Sengupta S, Sobieszczyk ME, Zucker JE, Asadourian PA, Bell FM, Boyd R, Cohen MF, Colquhoun MI, Colville LA, de Jonge JH, Dershowitz LB, Dey SA, Eiseman KA, Girvin ZP, Goni DT, Harb AA, Herzik N, Householder S, Karaaslan LE, Lee H, Lieberman E, Ling A, Lu R, Shou AY, Sisti AC, Snow ZE, Sperring CP, Xiong Y, Zhou HW, Natarajan K, Hripcsak G, Chen R. Characterization and clinical course of 1000 patients with coronavirus disease 2019 in New York: retrospective case series. BMJ. 2020;34:m1996. doi: 10.1136/bmj.m1996. - DOI - PMC - PubMed
    1. Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); 2019. - DOI
    1. Bi Q, Wu Y, Mei S, Ye C, Zou X, Zhang Z, Liu X, Wei L, Truelove SA, Zhang T, Gao W, Cheng C, Tang X, Wu X, Wu Y, Sun B, Huang S, Sun Y, Zhang J, Ma T, Lessler J, Feng T. Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. The Lancet. Infectious Diseases. 2020;20:118. doi: 10.1016/S1473-3099(20)30287-5. - DOI - PMC - PubMed

Publication types

MeSH terms