Text mining for the biocuration workflow

Lynette Hirschman¹, Gully A P C Burns, Martin Krallinger, Cecilia Arighi, K Bretonnel Cohen, Alfonso Valencia, Cathy H Wu, Andrew Chatr-Aryamontri, Karen G Dowell, Eva Huala, Anália Lourenço, Robert Nash, Anne-Lise Veuthey, Thomas Wiegers, Andrew G Winter

Affiliations

PMID: 22513129
PMCID: PMC3328793
DOI: 10.1093/database/bas020

Text mining for the biocuration workflow

Lynette Hirschman et al. Database (Oxford). 2012.

. 2012 Apr 18:2012:bas020.

doi: 10.1093/database/bas020. Print 2012.

Authors

Affiliation

¹ The MITRE Corporation, 202 Burlington Road, Bedford, MA 01730, USA. lynette@mitre.org

PMID: 22513129
PMCID: PMC3328793
DOI: 10.1093/database/bas020

Abstract

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.

PubMed Disclaimer

Figures

**Figure 1.**
Text mining and the biocuration workflow: main tasks of a canonical annotation workflow, including (A) triage, (B) bio-entity identification and normalization, (C) annotation event detection, (D) evidential qualifier association and (E) database record completion.

See this image and copyright information in PMC

References

1. Galperin MY, Cochrane GR. The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucleic Acids Res. 2011;39(Suppl 1):D1–D6. - PMC - PubMed
1. Lanzen A, Oinn T. The Taverna Interaction Service: enabling manual interaction in workflows. Bioinformatics. 2008;24:1118–1120. - PubMed
1. Hull D, Wolstencroft K, Stevens R, et al. Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 2006;34(Web Server issue):W729–W732. - PMC - PubMed
1. Burns GAPC, Krallinger M, Cohen KB, et al. Biocuration Workflow Catalogue—Text Mining for the Biocuration Workflow. 2009. Nature Precedings. http://dx.doi.org/10.1038/npre.2009.3250.1 (2 March 2012, date last accessed) - DOI
1. Krallinger M. A Framework for BioCuration Workflows (part II) Nature Precedings. 2009 http://dx.doi.org/10.1038/npre.2009.3126.1 (2 March 2012, date last accessed) - DOI

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Text mining for the biocuration workflow

Affiliation

Text mining for the biocuration workflow

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials