. 2019 Apr 1;26(4):364-379.

doi: 10.1093/jamia/ocy173.

Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review

Theresa A Koleck¹, Caitlin Dreisbach^{2

3}, Philip E Bourne³, Suzanne Bakken^{1

4

5}

Affiliations

¹ School of Nursing, Columbia University, New York, New York, USA.
² School of Nursing, University of Virginia, Charlottesville, Virginia, USA.
³ Data Science Institute, University of Virginia, Charlottesville, Virginia, USA.
⁴ Department of Biomedical Informatics, Columbia University, New York, New York, USA.
⁵ Data Science Institute, Columbia University, New York, New York, USA.

PMID: 30726935
PMCID: PMC6657282
DOI: 10.1093/jamia/ocy173

Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review

Theresa A Koleck et al. J Am Med Inform Assoc. 2019.

. 2019 Apr 1;26(4):364-379.

doi: 10.1093/jamia/ocy173.

Authors

Theresa A Koleck¹, Caitlin Dreisbach^{2

3}, Philip E Bourne³, Suzanne Bakken^{1

4

5}

Affiliations

¹ School of Nursing, Columbia University, New York, New York, USA.
² School of Nursing, University of Virginia, Charlottesville, Virginia, USA.
³ Data Science Institute, University of Virginia, Charlottesville, Virginia, USA.
⁴ Department of Biomedical Informatics, Columbia University, New York, New York, USA.
⁵ Data Science Institute, Columbia University, New York, New York, USA.

PMID: 30726935
PMCID: PMC6657282
DOI: 10.1093/jamia/ocy173

Abstract

Objective: Natural language processing (NLP) of symptoms from electronic health records (EHRs) could contribute to the advancement of symptom science. We aim to synthesize the literature on the use of NLP to process or analyze symptom information documented in EHR free-text narratives.

Materials and methods: Our search of 1964 records from PubMed and EMBASE was narrowed to 27 eligible articles. Data related to the purpose, free-text corpus, patients, symptoms, NLP methodology, evaluation metrics, and quality indicators were extracted for each study.

Results: Symptom-related information was presented as a primary outcome in 14 studies. EHR narratives represented various inpatient and outpatient clinical specialties, with general, cardiology, and mental health occurring most frequently. Studies encompassed a wide variety of symptoms, including shortness of breath, pain, nausea, dizziness, disturbed sleep, constipation, and depressed mood. NLP approaches included previously developed NLP tools, classification methods, and manually curated rule-based processing. Only one-third (n = 9) of studies reported patient demographic characteristics.

Discussion: NLP is used to extract information from EHR free-text narratives written by a variety of healthcare providers on an expansive range of symptoms across diverse clinical specialties. The current focus of this field is on the development of methods to extract symptom information and the use of symptom information for disease classification tasks rather than the examination of symptoms themselves.

Conclusion: Future NLP studies should concentrate on the investigation of symptoms and symptom documentation in EHR free-text narratives. Efforts should be undertaken to examine patient characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.

Keywords: electronic health records; natural language processing; review; signs and symptoms.

PubMed Disclaimer

Figures

**Figure 1.**
Flow diagram of included articles. NLP: natural language processing.

**Figure 2.**
Chord diagram of symptoms by clinical category included in systematic review articles. Relationships between symptoms (color sectors and tracks) and articles (black sectors) included in the systematic review are displayed. Individual symptoms are arranged via color by clinical category. Symptom sector size is proportional to the number of unique articles that include a given symptom. Article sector size is proportional to the number of unique symptoms included in a given study. Sample sizes in the legend correspond to the number of unique articles overall and in each clinical category. Shortness of breath includes dyspnea and orthopnea. Pain includes pain, ache, or discomfort not specified as occurring in the chest or abdomen. The figure was generated using R statistical software (R Foundation for Statistical Computing (R version 3.3.1), Vienna, Austria).

See this image and copyright information in PMC

References

1. Mehta N, Pandit A.. Concurrence of big data analytics and healthcare: a systematic review. Int J Med Inform 2018; 114: 57–65. - PubMed
1. Yim W-W, Yetisgen M, Harris WP, et al. Natural language processing in oncology. JAMA Oncol 2016; 2 (6): 797–804. - PubMed
1. Fleuren WWM, Alkema W.. Application of text mining in the biomedical domain. Methods 2015; 74: 97–106. - PubMed
1. Wang Y, Wang L, Rastegar-Mojarad M, et al. Clinical information extraction applications: a literature review. J Biomed Inform 2018; 77: 34–49. - PMC - PubMed
1. Institute of Medicine (US) Committee on Data Standards for Patient Safety. Key Capabilities of an Electronic Health Record System: Letter Report Washington, DC: National Academies Press. 2003. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review

Affiliations

Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources