Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec 26:2016:baw161.
doi: 10.1093/database/baw161. Print 2016.

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges

Affiliations

Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges

Ayush Singhal et al. Database (Oxford). .

Abstract

Text mining in the biomedical sciences is rapidly transitioning from small-scale evaluation to large-scale application. In this article, we argue that text-mining technologies have become essential tools in real-world biomedical research. We describe four large scale applications of text mining, as showcased during a recent panel discussion at the BioCreative V Challenge Workshop. We draw on these applications as case studies to characterize common requirements for successfully applying text-mining techniques to practical biocuration needs. We note that system 'accuracy' remains a challenge and identify several additional common difficulties and potential research directions including (i) the 'scalability' issue due to the increasing need of mining information from millions of full-text articles, (ii) the 'interoperability' issue of integrating various text-mining systems into existing curation workflows and (iii) the 'reusability' issue on the difficulty of applying trained systems to text genres that are not seen previously during development. We then describe related efforts within the text-mining community, with a special focus on the BioCreative series of challenge workshops. We believe that focusing on the near-term challenges identified in this work will amplify the opportunities afforded by the continued adoption of text-mining tools. Finally, in order to sustain the curation ecosystem and have text-mining systems adopted for practical benefits, we call for increased collaboration between text-mining researchers and various stakeholders, including researchers, publishers and biocurators.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Interconnection between literature services and biological databases.

References

    1. UniProt C. (2015) UniProt: a hub for protein information. Nucleic Acids Res., 43, D204–D212. - PMC - PubMed
    1. NCBI Resource Coordinators (2015) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 43, D6–D17. - PMC - PubMed
    1. Baxevanis A.D., Bateman A. (2006) The importance of biological databases in biological discovery. Curr. Protoc. Bioinformatics, 50, 1.1.1.–1.1.8. - PubMed
    1. Wei C.H., Kao H.Y., Lu Z. (2013) PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res., 41, W518–W522. - PMC - PubMed
    1. The Europe PMC Consortium (2015) Europe PMC: a full-text literature database for the life sciences and platform for innovation. Nucleic Acids Res., 43, D1042–D1048. - PMC - PubMed

Publication types