Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Jan;17(1):132-44.
doi: 10.1093/bib/bbv024. Epub 2015 May 1.

Community challenges in biomedical text mining over 10 years: success, failure and the future

Review

Community challenges in biomedical text mining over 10 years: success, failure and the future

Chung-Chi Huang et al. Brief Bioinform. 2016 Jan.

Abstract

One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations.

Keywords: BioNLP challenges; BioNLP shared tasks; biomedical natural language processing (BioNLP); critical assessment; text mining.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Data channeling based on NLP technology and the application of the channeled data. NLP technology (e.g. Document Retrieval or Information Extraction) helps alleviate scientists in biology and life science from significant efforts of manual searching/researching for text snippets of interest by narrowing down the search space. The topics of BioNLP challenge tasks with the focus of the NLP technology are exemplified. For instance, topics associated with Information Extraction in BioNLP include, but not limit to, finding drug–drug interactions, protein–protein interactions, gene relations, clinical temporal relations and references into gene functions. The channeled/text-mined data, on the other hand, can be further used to curate databases, construct ontologies, build semantic networks or interactive systems and so on.
Figure 2
Figure 2
BioNLP challenges in chronological order. Challenges are shown in bold white font, whereas their specific task focus is shown in italic black font following the task/track short-hands in Table 1.
Figure 3
Figure 3
Challenges’ subtasks/tracks organized based on NLP perspectives. A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.
Figure 4
Figure 4
Different biological and clinical problems targeted by BioNLP challenges. Challenge subtasks are coded in the same colors as in Figure 2 (e.g. BioNLP-ST tasks are green marked). A colour version of this figure is available at BIB online: http://bib.oxfordjournals.org.
Figure 5
Figure 5
The typical workflow of organizing a shared task.
Figure 6
Figure 6
Impact/contributions from BioNLP challenges.

Similar articles

Cited by

References

    1. Lu Z. PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford) 2011;2011:baq036. - PMC - PubMed
    1. Khare R, Leaman R, Lu Z. Accessing biomedical literature in the current information landscape. Methods Mol Biol 2014;1159:11–31. - PMC - PubMed
    1. Islamaj Dogan R, Murray GC, Neveol A, et al. Understanding PubMed user search behavior through log analysis. Database (Oxford) 2009;2009:bap018. - PMC - PubMed
    1. Jensen LJ, Saric J, Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nat Rev Genet 2006;7:119–29. - PubMed
    1. Shatkay H, Feldman R. Mining the biomedical literature in the genomic era: an overview. J Comput Biol 2003;10(6):821–55. - PubMed

Publication types