Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug 31;17(8):e212.
doi: 10.2196/jmir.4612.

Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

Affiliations

Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

Albert Park et al. J Med Internet Res. .

Abstract

Background: The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time.

Objective: The primary objective of this study is to explore an alternative approach-using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap.

Methods: Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap's commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures.

Results: From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed term failures, and (3) word ambiguity failures. Within these three failure types, we discovered 12 causes of inaccurate mappings of concepts. We used automated methods to detect almost half of 383,572 MetaMap's mappings as problematic. Word sense ambiguity failure was the most widely occurring, comprising 82.22% of failures. Boundary failure was the second most frequent, amounting to 15.90% of failures, while missed term failures were the least common, making up 1.88% of failures. The automated failure detection achieved precision, recall, accuracy, and F1 score of 83.00%, 92.57%, 88.17%, and 87.52%, respectively.

Conclusions: We illustrate the challenges of processing patient-generated online health community text and characterize failures of NLP tools on this patient-generated health text, demonstrating the feasibility of our low-cost approach to automatically detect those failures. Our approach shows the potential for scalable and effective solutions to automatically assess the constantly evolving NLP tools and source vocabularies to process patient-generated text.

Keywords: UMLS; automatic data processing; information extraction; natural language processing; quantitative evaluation.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Example failures that resulted from the application of MetaMap to process patient-generated text in an online health community (blue terms represent patient-generated text; black terms represent MetaMap’s interpretation; and red terms represent failure type).

Similar articles

Cited by

References

    1. Fox S, Rainie L. Pew Research Center Internet, Science & Tech. 2014. Feb 27, [2015-04-23]. The Web at 25 in the US The overall verdict: The internet has been a plus for society and an especially good thing for individual users http://www.pewinternet.org/2014/02/27/the-web-at-25-in-the-u-s/
    1. Fox S. Pew Research Center Internet, Science & Tech. 2005. May 17, [2015-04-23]. Health Information online: Eight in ten internet users have looked for health information online, with increased interest in diet, fitness, drugs, health insurance, experimental treatments, and particular doctors and hospitals http://www.pewinternet.org/2005/05/17/health-information-online/
    1. Fox S. Pew Research Center Internet, Science & Tech. 2011. Feb 28, [2015-04-25]. Peer-to-peer healthcare: The internet gives patients and caregivers access not only to information, but also to each other http://www.pewinternet.org/2011/02/28/peer-to-peer-health-care-2/
    1. Eysenbach G. Medicine 2.0: social networking, collaboration, participation, apomediation, and openness. J Med Internet Res. 2008;10(3):e22. doi: 10.2196/jmir.1030. http://www.jmir.org/2008/3/e22/ v10i3e22 - DOI - PMC - PubMed
    1. Starbird K, Palen L. ‘Voluntweeters’: Self-Organizing by Digital Volunteers in Times of Crisis. ACM CHI Conference on Human Factors in Computing Systems; May 07-12, 2011; Vancouver, BC. ACM; 2011. pp. 1071–1080. http://dl.acm.org/citation.cfm?id=1979102 - DOI

Publication types