Predicting COVID-19 Symptoms From Free Text in Medical Records Using Artificial Intelligence: Feasibility Study
- PMID: 35442903
- PMCID: PMC9049643
- DOI: 10.2196/37771
Predicting COVID-19 Symptoms From Free Text in Medical Records Using Artificial Intelligence: Feasibility Study
Abstract
Background: Electronic medical records have opened opportunities to analyze clinical practice at large scale. Structured registries and coding procedures such as the International Classification of Primary Care further improved these procedures. However, a large part of the information about the state of patient and the doctors' observations is still entered in free text fields. The main function of those fields is to report the doctor's line of thought, to remind oneself and his or her colleagues on follow-up actions, and to be accountable for clinical decisions. These fields contain rich information that can be complementary to that in coded fields, and until now, they have been hardly used for analysis.
Objective: This study aims to develop a prediction model to convert the free text information on COVID-19-related symptoms from out of hours care electronic medical records into usable symptom-based data that can be analyzed at large scale.
Methods: The design was a feasibility study in which we examined the content of the raw data, steps and methods for modelling, as well as the precision and accuracy of the models. A data prediction model for 27 preidentified COVID-19-relevant symptoms was developed for a data set derived from the database of primary-care out-of-hours consultations in Flanders. A multiclass, multilabel categorization classifier was developed. We tested two approaches, which were (1) a classical machine learning-based text categorization approach, Binary Relevance, and (2) a deep neural network learning approach with BERTje, including a domain-adapted version. Ethical approval was acquired through the Institutional Review Board of the Institute of Tropical Medicine and the ethics committee of the University Hospital of Antwerpen (ref 20/50/693).
Results: The sample set comprised 3957 fields. After cleaning, 2313 could be used for the experiments. Of the 2313 fields, 85% (n=1966) were used to train the model, and 15% (n=347) for testing. The normal BERTje model performed the best on the data. It reached a weighted F1 score of 0.70 and an exact match ratio or accuracy score of 0.38, indicating the instances for which the model has identified all correct codes. The other models achieved respectable results as well, ranging from 0.59 to 0.70 weighted F1. The Binary Relevance method performed the best on the data without a frequency threshold. As for the individual codes, the domain-adapted version of BERTje performs better on several of the less common objective codes, while BERTje reaches higher F1 scores for the least common labels especially, and for most other codes in general.
Conclusions: The artificial intelligence model BERTje can reliably predict COVID-19-related information from medical records using text mining from the free text fields generated in primary care settings. This feasibility study invites researchers to examine further possibilities to use primary care routine data.
Keywords: COVID-19; artificial intelligence; coding procedure; electronic medical records; feasibility study; natural language processing; precision model; prediction model; primary care; structured registry; text mining.
©Josefien Van Olmen, Jens Van Nooten, Hilde Philips, Annet Sollie, Walter Daelemans. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 27.04.2022.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures
References
-
- Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc. 2016 Sep 05;23(5):1007–15. doi: 10.1093/jamia/ocv180. http://europepmc.org/abstract/MED/26911811 ocv180 - DOI - PMC - PubMed
-
- Koleck TA, Dreisbach C, Bourne PE, Bakken S. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc. 2019 Apr 01;26(4):364–379. doi: 10.1093/jamia/ocy173. http://europepmc.org/abstract/MED/30726935 5307912 - DOI - PMC - PubMed
-
- Duz M, Marshall JF, Parkin T. Validation of an Improved Computer-Assisted Technique for Mining Free-Text Electronic Medical Records. JMIR Med Inform. 2017 Jun 29;5(2):e17. doi: 10.2196/medinform.7123. https://medinform.jmir.org/2017/2/e17/ v5i2e17 - DOI - PMC - PubMed
-
- Kaya H, Alcan V, Zinnuroğlu M, Karataş GK, Çoban S, Dolgun M, Deniz S. Analysis of free text in electronic health records by using text mining methods. 7th International Conference on Advanced Technologies(ICAT’18); 28 April - 01 May 2018; Antalya, Turkey. 2018.
LinkOut - more resources
Full Text Sources
Research Materials
