Entity recognition from clinical texts via recurrent neural network

Zengjian Liu¹, Ming Yang², Xiaolong Wang¹, Qingcai Chen¹, Buzhou Tang^{3

4}, Zhe Wang⁵, Hua Xu⁶

Affiliations

¹ Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, China.
² Pharmacy Department, Shenzhen Second People's Hospital, First Affiliated Hospital of Shenzhen University, Shenzhen, 518035, China.
³ Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, China. tangbuzhou@gmail.com.
⁴ Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China. tangbuzhou@gmail.com.
⁵ Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China.
⁶ School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.

PMID: 28699566
PMCID: PMC5506598
DOI: 10.1186/s12911-017-0468-7

Entity recognition from clinical texts via recurrent neural network

Zengjian Liu et al. BMC Med Inform Decis Mak. 2017.

. 2017 Jul 5;17(Suppl 2):67.

doi: 10.1186/s12911-017-0468-7.

Authors

Zengjian Liu¹, Ming Yang², Xiaolong Wang¹, Qingcai Chen¹, Buzhou Tang^{3

4}, Zhe Wang⁵, Hua Xu⁶

Affiliations

¹ Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, China.
² Pharmacy Department, Shenzhen Second People's Hospital, First Affiliated Hospital of Shenzhen University, Shenzhen, 518035, China.
³ Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, China. tangbuzhou@gmail.com.
⁴ Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China. tangbuzhou@gmail.com.
⁵ Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012, China.
⁶ School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.

PMID: 28699566
PMCID: PMC5506598
DOI: 10.1186/s12911-017-0468-7

Abstract

Background: Entity recognition is one of the most primary steps for text analysis and has long attracted considerable attention from researchers. In the clinical domain, various types of entities, such as clinical entities and protected health information (PHI), widely exist in clinical texts. Recognizing these entities has become a hot topic in clinical natural language processing (NLP), and a large number of traditional machine learning methods, such as support vector machine and conditional random field, have been deployed to recognize entities from clinical texts in the past few years. In recent years, recurrent neural network (RNN), one of deep learning methods that has shown great potential on many problems including named entity recognition, also has been gradually used for entity recognition from clinical texts.

Methods: In this paper, we comprehensively investigate the performance of LSTM (long-short term memory), a representative variant of RNN, on clinical entity recognition and protected health information recognition. The LSTM model consists of three layers: input layer - generates representation of each word of a sentence; LSTM layer - outputs another word representation sequence that captures the context information of each word in this sentence; Inference layer - makes tagging decisions according to the output of LSTM layer, that is, outputting a label sequence.

Results: Experiments conducted on corpora of the 2010, 2012 and 2014 i2b2 NLP challenges show that LSTM achieves highest micro-average F1-scores of 85.81% on the 2010 i2b2 medical concept extraction, 92.29% on the 2012 i2b2 clinical event detection, and 94.37% on the 2014 i2b2 de-identification, which is considerably competitive with other state-of-the-art systems.

Conclusions: LSTM that requires no hand-crafted feature has great potential on entity recognition from clinical texts. It outperforms traditional machine learning methods that suffer from fussy feature engineering. A possible future direction is how to integrate knowledge bases widely existing in the clinical domain into LSTM, which is a case of our future work. Moreover, how to use LSTM to recognize entities in specific formats is also another possible future direction.

Keywords: Clinical notes; Deep learning; Entity recognition; Recurrent neural network; Sequence labeling.

PubMed Disclaimer

Figures

**Fig. 1**
Overview architecture of our LSTM

**Fig. 3**
Character-level representation generation models. a Bidirectional LSTM. b CNN

See this image and copyright information in PMC

References

1. Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc. 1994;1:161–174. doi: 10.1136/jamia.1994.95236146. - DOI - PMC - PubMed
1. Christensen LM, Haug PJ, Fiszman M. MPLUS: a probabilistic medical language understanding system. In Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain-Volume 3. Stroudsburg: Association for Computational Linguistics; 2002:29–36.
1. Koehler SB. SymText: a natural language understanding system for encoding free text medical data. Salt Lake City: The University of Utah; 1998.
1. Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17:229–236. doi: 10.1136/jamia.2009.002733. - DOI - PMC - PubMed
1. Denny JC, Irani PR, Wehbe FH, Smithers JD, Spickard A., III . AMIA Annu Symp Proc; 2003. 2003. The KnowledgeMap project: development of a concept-based medical school curriculum database; pp. 195–199. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Entity recognition from clinical texts via recurrent neural network

Affiliations

Entity recognition from clinical texts via recurrent neural network

Authors

Affiliations

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous