Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 1;29(13):i44-52.
doi: 10.1093/bioinformatics/btt227.

A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text

Affiliations

A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text

Makoto Miwa et al. Bioinformatics. .

Abstract

Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge.

Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches.

Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText.

Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Illustration of PathText 2 architecture
Fig. 2.
Fig. 2.
Illustration of event representation
Fig. 3.
Fig. 3.
Screenshot of PathText 2 web interface
Fig. 4.
Fig. 4.
Learning curve on SVM-based ranking

References

    1. Ananiadou S, et al. Text mining and its potential applications in systems biology. Trends Biotechnol. 2006;24:571–579. - PubMed
    1. Ananiadou S, et al. Event extraction for systems biology by text mining the literature. Trends Biotechnol. 2010;28:381–390. - PubMed
    1. Courtot M, et al. Controlled vocabularies and semantics in systems biology. Mol. Syst. Biol. 2011;7:543. - PMC - PubMed
    1. Demir E, et al. The BioPAX community standard for pathway data sharing. Nat. Biotechnol. 2010;28:935–942. - PMC - PubMed
    1. Drucker H, et al. 1996. Support vector regression machines. In: NIPS’96. MIT Press, Cambridge, MA, USA, pp. 155–161.

Publication types

LinkOut - more resources