Combining free text and structured electronic medical record entries to detect acute respiratory infections
- PMID: 20976281
- PMCID: PMC2954790
- DOI: 10.1371/journal.pone.0013377
Combining free text and structured electronic medical record entries to detect acute respiratory infections
Abstract
Background: The electronic medical record (EMR) contains a rich source of information that could be harnessed for epidemic surveillance. We asked if structured EMR data could be coupled with computerized processing of free-text clinical entries to enhance detection of acute respiratory infections (ARI).
Methodology: A manual review of EMR records related to 15,377 outpatient visits uncovered 280 reference cases of ARI. We used logistic regression with backward elimination to determine which among candidate structured EMR parameters (diagnostic codes, vital signs and orders for tests, imaging and medications) contributed to the detection of those reference cases. We also developed a computerized free-text search to identify clinical notes documenting at least two non-negated ARI symptoms. We then used heuristics to build case-detection algorithms that best combined the retained structured EMR parameters with the results of the text analysis.
Principal findings: An adjusted grouping of diagnostic codes identified reference ARI patients with a sensitivity of 79%, a specificity of 96% and a positive predictive value (PPV) of 32%. Of the 21 additional structured clinical parameters considered, two contributed significantly to ARI detection: new prescriptions for cough remedies and elevations in body temperature to at least 38°C. Together with the diagnostic codes, these parameters increased detection sensitivity to 87%, but specificity and PPV declined to 95% and 25%, respectively. Adding text analysis increased sensitivity to 99%, but PPV dropped further to 14%. Algorithms that required satisfying both a query of structured EMR parameters as well as text analysis disclosed PPVs of 52-68% and retained sensitivities of 69-73%.
Conclusion: Structured EMR parameters and free-text analyses can be combined into algorithms that can detect ARI cases with new levels of sensitivity or precision. These results highlight potential paths by which repurposed EMR information could facilitate the discovery of epidemics before they cause mass casualties.
Conflict of interest statement
Figures

Similar articles
-
Using the electronic medical record to identify community-acquired pneumonia: toward a replicable automated strategy.PLoS One. 2013 Aug 13;8(8):e70944. doi: 10.1371/journal.pone.0070944. eCollection 2013. PLoS One. 2013. PMID: 23967138 Free PMC article.
-
Epidemic surveillance using an electronic medical record: an empiric approach to performance improvement.PLoS One. 2014 Jul 9;9(7):e100845. doi: 10.1371/journal.pone.0100845. eCollection 2014. PLoS One. 2014. PMID: 25006878 Free PMC article.
-
Enhancing ICD-Code-Based Case Definition for Heart Failure Using Electronic Medical Record Data.J Card Fail. 2020 Jul;26(7):610-617. doi: 10.1016/j.cardfail.2020.04.003. Epub 2020 Apr 15. J Card Fail. 2020. PMID: 32304875
-
Use of electronic health data to identify patients with moderate-to-severe osteoarthritis of the hip and/or knee and inadequate response to pain medications.BMC Med Res Methodol. 2023 Jun 30;23(1):156. doi: 10.1186/s12874-023-01964-y. BMC Med Res Methodol. 2023. PMID: 37391751 Free PMC article. Review.
-
Why is warfarin underused for stroke prevention in atrial fibrillation? A detailed review of electronic medical records.Curr Med Res Opin. 2012 Sep;28(9):1407-14. doi: 10.1185/03007995.2012.708653. Epub 2012 Jul 26. Curr Med Res Opin. 2012. PMID: 22746356 Review.
Cited by
-
TEPAPA: a novel in silico feature learning pipeline for mining prognostic and associative factors from text-based electronic medical records.Sci Rep. 2017 Jul 31;7(1):6918. doi: 10.1038/s41598-017-07111-0. Sci Rep. 2017. PMID: 28761061 Free PMC article.
-
Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text?BMC Med Res Methodol. 2013 Aug 21;13:105. doi: 10.1186/1471-2288-13-105. BMC Med Res Methodol. 2013. PMID: 23964710 Free PMC article.
-
A method to advance adolescent sexual health research: Automated algorithm finds sexual history documentation.Front Digit Health. 2022 Jul 22;4:836733. doi: 10.3389/fdgth.2022.836733. eCollection 2022. Front Digit Health. 2022. PMID: 35937421 Free PMC article.
-
Natural Language Processing for EHR-Based Computational Phenotyping.IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):139-153. doi: 10.1109/TCBB.2018.2849968. Epub 2018 Jun 25. IEEE/ACM Trans Comput Biol Bioinform. 2019. PMID: 29994486 Free PMC article. Review.
-
Can long-term historical data from electronic medical records improve surveillance for epidemics of acute respiratory infections? A systematic evaluation.PLoS One. 2018 Jan 31;13(1):e0191324. doi: 10.1371/journal.pone.0191324. eCollection 2018. PLoS One. 2018. PMID: 29385161 Free PMC article.
References
-
- Leng Q, Bentwich Z. A novel coronavirus and SARS. N Engl J Med. 2003;349:709. - PubMed
-
- Abdel-Ghafar AN, Chotpitayasunondh T, Gao Z, Hayden FG, Nguyen DH, et al. Update on avian influenza A (H5N1) virus infection in humans. N Engl J Med. 2008;358:261–273. - PubMed
-
- Dawood FS, Jain S, Finelli L, Shaw MW, Lindstrom S, et al. Emergence of a novel swine-origin influenza A (H1N1) virus in humans. N Engl J Med. 2009;360:2605–2615. - PubMed
-
- Ferguson NM, Cummings DA, Cauchemez S, Fraser C, Riley S, et al. Strategies for containing an emerging influenza pandemic in Southeast Asia. Nature. 2005;437:209–214. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources