Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec 1;33(23):3784-3792.
doi: 10.1093/bioinformatics/btx466.

Using uncertainty to link and rank evidence from biomedical literature for model curation

Affiliations

Using uncertainty to link and rank evidence from biomedical literature for model curation

Chrysoula Zerva et al. Bioinformatics. .

Abstract

Motivation: In recent years, there has been great progress in the field of automated curation of biomedical networks and models, aided by text mining methods that provide evidence from literature. Such methods must not only extract snippets of text that relate to model interactions, but also be able to contextualize the evidence and provide additional confidence scores for the interaction in question. Although various approaches calculating confidence scores have focused primarily on the quality of the extracted information, there has been little work on exploring the textual uncertainty conveyed by the author. Despite textual uncertainty being acknowledged in biomedical text mining as an attribute of text mined interactions (events), it is significantly understudied as a means of providing a confidence measure for interactions in pathways or other biomedical models. In this work, we focus on improving identification of textual uncertainty for events and explore how it can be used as an additional measure of confidence for biomedical models.

Results: We present a novel method for extracting uncertainty from the literature using a hybrid approach that combines rule induction and machine learning. Variations of this hybrid approach are then discussed, alongside their advantages and disadvantages. We use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction. Our approach achieves F-scores of 0.76 and 0.88 based on the BioNLP-ST and Genia-MK corpora, respectively, making considerable improvements over previously published work. Moreover, we evaluate our proposed system on pathways related to two different areas, namely leukemia and melanoma cancer research.

Availability and implementation: The leukemia pathway model used is available in Pathway Studio while the Ras model is available via PathwayCommons. Online demonstration of the uncertainty extraction system is available for research purposes at http://argo.nactem.ac.uk/test. The related code is available on https://github.com/c-zrv/uncertainty_components.git. Details on the above are available in the Supplementary Material.

Contact: sophia.ananiadou@manchester.ac.uk.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Event structures according to the BioNLP schema. Event triggers are enclosed in double-lined (green) boxes, while named entities (NEs) in single-lined (blue) ones. Arguments of events are represented by arrows above the words. We can observe that the Regulation event is a complex event, having the Binding event as its Theme argument
Fig. 2.
Fig. 2.
Uncertainty cues considered in the experiments grouped according to category (Strong/Weak speculation, frequency, Admission of lack of knowledge, Weaseling). Word clouds were generated based on BioNLP-ST and GENIA-MK
Fig. 3.
Fig. 3.
Relation between the influence of uncertainty cues and syntactic dependencies. Dependencies are marked with arrows above text, while the scope of the uncertainty cue may is marked with the red squared brackets (Color version of this figure is available at Bioinformatics online.)
Fig. 4.
Fig. 4.
Distribution of scores for (un)certainty between annot. 1 (solid colored (blue) bars) and annot. 2 (vertically stripped white bars) (Color version of this figure is available at Bioinformatics online.)
Fig. 5.
Fig. 5.
Performance in terms of precision, recall and F-score, depending on the selection of the mean average score as the upper limit of uncertainty (i.e. the value below which all scored events must be considered uncertain)

Similar articles

Cited by

References

    1. Ananiadou S. et al. (2015) Event-based text mining for biology and functional genomics. Brief. Funct. Genomics, 14, 213–230. - PMC - PubMed
    1. Bader J.S. et al. (2004) Gaining confidence in high-throughput protein interaction networks. Nat. Biotechnol., 22, 78–85. - PubMed
    1. Björne J., Salakoski T. (2011) Generalizing biomedical event extraction. Proceedings of the BioNLP, 2011workshop, 183–191.
    1. Björne J., Tapio S. (2015) TEES 2.2: biomedical event extraction for diverse corpora. BMC Bioinformatics, 16, 1–20. - PMC - PubMed
    1. Björne J. et al. (2010) Complex event extraction at PubMed scale. Bioinformatics, 26, 382–390. - PMC - PubMed