Review

. 2019 Apr 27;7(2):e12239.

doi: 10.2196/12239.

Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review

Seyedmostafa Sheikhalishahi^{1

2}, Riccardo Miotto³, Joel T Dudley³, Alberto Lavelli⁴, Fabio Rinaldi⁵, Venet Osmani¹

Affiliations

¹ eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy.
² Department of Information Engineering and Computer Science, University of Trento, Trento, Italy.
³ Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
⁴ NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy.
⁵ Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland.

PMID: 31066697
PMCID: PMC6528438
DOI: 10.2196/12239

Review

Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review

Seyedmostafa Sheikhalishahi et al. JMIR Med Inform. 2019.

. 2019 Apr 27;7(2):e12239.

doi: 10.2196/12239.

Authors

Seyedmostafa Sheikhalishahi^{1

2}, Riccardo Miotto³, Joel T Dudley³, Alberto Lavelli⁴, Fabio Rinaldi⁵, Venet Osmani¹

Affiliations

¹ eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy.
² Department of Information Engineering and Computer Science, University of Trento, Trento, Italy.
³ Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States.
⁴ NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy.
⁵ Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland.

PMID: 31066697
PMCID: PMC6528438
DOI: 10.2196/12239

Abstract

Background: Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset.

Objective: The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives.

Methods: Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles.

Results: Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes.

Conclusions: Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.

Keywords: cancer; chronic diseases; clinical notes; deep learning; diabetes; electronic health records; heart disease; lung disease; machine learning; natural language processing; stroke.

©Seyedmostafa Sheikhalishahi, Riccardo Miotto, Joel T Dudley, Alberto Lavelli, Fabio Rinaldi, Venet Osmani. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 27.04.2019.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Preferred Reporting Items for Systematic Reviews and Meta-Analyses article selection flowchart. ACM: Association for Computing Machinery; NLP: natural language processing.

**Figure 2**
Relationship between chronic diseases (black sectors) and articles included in the review (for clarity we have included only diseases that are addressed by three or more articles).

**Figure 3**
Natural language processing rule-based methods versus machine learning for chronic diseases.

**Figure 4**
Categorization of the publication venues.

**Figure 5**
Distribution of included studies according to publication venues.

See this image and copyright information in PMC

References

1. World Health Organization. [2019-03-29]. WHO Global status report on noncommunicable diseases 2014 https://www.who.int/nmh/publications/ncd-status-report-2014/en/
1. Kruse CS, Kothman K, Anerobi K, Abanaka L. Adoption factors of the electronic health record: a systematic review. JMIR Med Inform. 2016 Jun 01;4(2):e19. doi: 10.2196/medinform.5525. - DOI - PMC - PubMed
1. Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep. 2016 Dec 17;6:26094. doi: 10.1038/srep26094. http://paperpile.com/b/UsJJXQ/OwN0t srep26094 - DOI - PMC - PubMed
1. Jensen P, Jensen L, Brunak S. Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet. 2012 May 02;13(6):395–405. doi: 10.1038/nrg3208.nrg3208 - DOI - PubMed
1. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JPA. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc. 2017 Jan;24(1):198–208. doi: 10.1093/jamia/ocw042.ocw042 - DOI - PMC - PubMed

Publication types

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review

Affiliations

Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous