Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Dec 26:2016:baw150.
doi: 10.1093/database/baw150. Print 2016.

Can we replace curation with information extraction software?

Affiliations
Review

Can we replace curation with information extraction software?

Peter D Karp. Database (Oxford). .

Abstract

Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.Database URL.

PubMed Disclaimer

References

    1. Bourne P.E., Lorsch J.R., Green E.D. (2015) Sustaining the big-data ecosystem. Nature, 527, S16–S17. - PubMed
    1. Karp P.D. How much does curation cost? submitted for publication.
    1. Wei C.H., Kao H.Y., Lu Z. (2013) PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res., 41(Web Server issue), W518–W522. - PMC - PubMed
    1. Wei C.H., Peng Y., Leaman R. et al. Overview of the biocreative v chemical-disease relation (cdr) task. In Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, pp. 1–13, 2015.
    1. Ananiadou S., Thompson P., Nawaz R. et al. (2014) Event-based text mining for biology and functional genomics. Brief Funct. Genomics, 3, 213–230 - PMC - PubMed

LinkOut - more resources