Natural language processing: an introduction
- PMID: 21846786
- PMCID: PMC3168328
- DOI: 10.1136/amiajnl-2011-000464
Natural language processing: an introduction
Abstract
Objectives: To provide an overview and tutorial of natural language processing (NLP) and modern NLP-system design.
Target audience: This tutorial targets the medical informatics generalist who has limited acquaintance with the principles behind NLP and/or limited knowledge of the current state of the art.
Scope: We describe the historical evolution of NLP, and summarize common NLP sub-problems in this extensive field. We then provide a synopsis of selected highlights of medical NLP efforts. After providing a brief description of common machine-learning approaches that are being used for diverse NLP sub-problems, we discuss how modern NLP architectures are designed, with a summary of the Apache Foundation's Unstructured Information Management Architecture. We finally consider possible future directions for NLP, and reflect on the possible impact of IBM Watson on the medical field.
Conflict of interest statement
Figures




References
-
- Manning C, Raghavan P, Schuetze H. Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2008
-
- Hutchins W. The First Public Demonstration of Machine Translation: the Georgetown-IBM System, 7th January 1954. 2005. http://www.hutchinsweb.me.uk/GU-IBM-2005.pdf (accessed 4 Jun 2011).
-
- Chomsky N. Three models for the description of language. IRE Trans Inf Theory 1956;2:113–24
-
- Aho AV, Sethi R, Ullman JD. Compilers: Principles, Techniques, Tools. Reading, MA: Addison-Wesley, 1988
-
- Chomsky N. On certain formal properties of grammars. Inform Contr 1959;2:137–67