Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 1:2020:baaa056.
doi: 10.1093/database/baaa056.

FLUTE: Fast and reliable knowledge retrieval from biomedical literature

Affiliations

FLUTE: Fast and reliable knowledge retrieval from biomedical literature

Emilee Holtzapple et al. Database (Oxford). .

Abstract

State-of-the-art machine reading methods extract, in hours, hundreds of thousands of events from the biomedical literature. However, many of the extracted biomolecular interactions are incorrect or not relevant for computational modeling of a system of interest. Therefore, rapid, automated methods are required to filter and select accurate and useful information. The FiLter for Understanding True Events (FLUTE) tool uses public protein interaction databases to filter interactions that have been extracted by machines from databases such as PubMed and score them for accuracy. Confidence in the interactions allows for rapid and accurate model assembly. As our results show, FLUTE can reliably determine the confidence in the biomolecular interactions extracted by fast machine readers and at the same time provide a speedup in interaction filtering by three orders of magnitude. Database URL: https://bitbucket.org/biodesignlab/flute.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An outline of the role of FLUTE in the automated information extraction and the model assembly and analysis flow: FLUTE uses available databases to filter extracted interactions obtained as output of knowledge extraction process from available literature, which is usually initiated by user queries. The selected interactions from FLUTE are inputs to model assembly that creates models, which are then explored with model analysis in order to provide answers to user questions.
Figure 2
Figure 2
From sentences to scored interactions: (a) example sentence and the corresponding machine reading output; (b) graphical representation of interactions; (c) interactions found between ERK1, ERK2 and RAGE by STRING.
Figure 3
Figure 3
Filtration process with FLUTE: inputs to FLUTE include extracted interactions, scores of these interactions that are found in databases, and the user’s selection of thresholds for the scores. Outputs from FLUTE include selected interactions determined by their scores and thresholds.
Figure 4
Figure 4
Databases and the connections between databases used by FLUTE.
Figure 5
Figure 5
Influence of query category and term choice (the legend corresponds to query numbers in Table 3) on the number of papers found in PubMed and on the number of interactions extracted from the top 200 papers (except Q12, Q13b and Q13c, where PubMed returned less than 200 hits). Results obtained for the same query topic category, but different term aliases, or different example terms, are grouped together with the same marker shape and similar color.
Figure 6
Figure 6
The influence of interaction type and machine reading errors on the number of selected interactions. (a) Overall distribution of interaction types for the three different queries, disease and biological process query, biological process and protein query and multiple protein query. (b) The comparison between FLUTE and manual selection; human judge decides whether interaction is correct given literature evidence, and FLUTE selects the interactions that are supported by databases. (c) The distribution of errors types in machine extraction of PPIs, PBPIs and PCIs for the three different queries.
Figure 7
Figure 7
The networks of selected PPI interactions for the Disease and Process set (top row), Process and Protein set (middle row), and Multiple Protein set (bottom row), where each edge color represents a value of escore (left), dscore (middle) or tscore (right). Each PPI edge is colored by the indicated score type, from the minimum (0) to the maximum score (1000). A red edge indicates a non-PPI.
Figure 8
Figure 8
The number of selected interactions, PPIs (top) and PCIs (bottom) as a function of a score threshold for each score type, for the three different queries.
Figure 9
Figure 9
Precision and recall of FLUTE, compared to human judging, and the sensitivity of precision and recall to the scores, for the three different queries: (a) precision and recall when filtering PPIs with only one subscore at a time, (b) average precision and recall when filtering PPIs for all possible subscore combinations and (c) precision and recall when filtering PBPI and PCIs.
Figure 10
Figure 10
(a) The golden set of interactions from (27). The gray box in the right-hand corner shows the number of golden interactions present in each set in red. Interactions obtained using the Fully Automated (FA) approach: (b) unfiltered, (c) filtered without thresholds, (d) tscore > 400, (e) tscore > 550, (f) tscore > 700. Interactions obtained using the Semi Automated (SA) approach: (g) unfiltered, (h) filtered, no thresholds, (i) tscore > 400, (j) tscore > 550, (k) tscore > 700. (l) Comparison of the two approaches: % of selected interactions in each approach as a function of a tscore threshold.
Figure 11
Figure 11
Results of STRING search for the T-cell case study: (a) Interactions between the STRING search terms (PTEN, AKT1 and FOXO1) illustrated as a graph. (b) Expanded network with 50 additional nodes, threshold of 0.65. (c) Expanded network with 50 additional nodes, threshold of 0.95. (NOTE: Unlike the STRING database, the STRING web application uses score values within the [0,1] interval, where 0 is low-confidence, and 1 is high confidence. Interactions from the list of golden interactions illustrated in Figure 10a are in red.)

References

    1. Bjorne J., Ginter F., Pyysalo S. et al. (2010) Complex event extraction at PubMed scale. Bioinformatics, 26, i382–i390. doi: 10.1093/bioinformatics/btq180. - DOI - PMC - PubMed
    1. Van Landeghem S., Bjorne J., Wei C.H. et al. (2013) Large-scale event extraction from literature with multi-level gene normalization. PLoS One, 8, 1–12, e55814. doi: 10.1371/journal.pone.0055814. - DOI - PMC - PubMed
    1. Gyori B.M., Bachman J.A., Subramanian K. et al. (2017) From word models to executable models of signaling networks using automated assembly. Mol. Syst. Biol., 13, 954–954. doi: 10.15252/msb.20177651. - DOI - PMC - PubMed
    1. Allen J., de Beaumont W., Galescu L., and Teng C. M (2015) Complex event extraction using DRUM In Proceedings of BioNLP 15, Beijing, China, Association for Computational Linguistics, pp. 1–11. doi: 10.18653/v1/W15-3801 https://www.aclweb.org/anthology/W15-3801. - DOI
    1. Novichkova S., Egorov S. and Daraselia N. (2003) MedScan, a natural language processing engine for MEDLINE abstracts. Bioinformatics, 19, 1699–1706. doi: 10.1093/bioinformatics/btg207. - DOI - PubMed

Publication types