Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 May;14(3):213-30.
doi: 10.1093/bfgp/elu015. Epub 2014 Jun 6.

Event-based text mining for biology and functional genomics

Review

Event-based text mining for biology and functional genomics

Sophia Ananiadou et al. Brief Funct Genomics. 2015 May.

Abstract

The assessment of genome function requires a mapping between genome-derived entities and biochemical reactions, and the biomedical literature represents a rich source of information about reactions between biological components. However, the increasingly rapid growth in the volume of literature provides both a challenge and an opportunity for researchers to isolate information about reactions of interest in a timely and efficient manner. In response, recent text mining research in the biology domain has been largely focused on the identification and extraction of 'events', i.e. categorised, structured representations of relationships between biochemical entities, from the literature. Functional genomics analyses necessarily encompass events as so defined. Automatic event extraction systems facilitate the development of sophisticated semantic search applications, allowing researchers to formulate structured queries over extracted events, so as to specify the exact types of reactions to be retrieved. This article provides an overview of recent research into event extraction. We cover annotated corpora on which systems are trained, systems that achieve state-of-the-art performance and details of the community shared tasks that have been instrumental in increasing the quality, coverage and scalability of recent systems. Finally, several concrete applications of event extraction are covered, together with emerging directions of research.

Keywords: event extraction; semantic annotation; semantic search; text mining.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
A ‘mind map’ summarising this Briefing. It should be read clockwise starting at 1 o’clock.
Figure 2:
Figure 2:
Simple bio-event example.
Figure 3:
Figure 3:
Sentence containing two events.
Figure 4:
Figure 4:
More complex sentence containing multiple events.
Figure 5:
Figure 5:
Annotated meta-knowledge example. The core elements of the event (i.e. the trigger for the Regulation event, and its Theme and Cause participants) have been enriched through the identification of cues that are relevant to various dimensions interpretation of the event, according to the meta-knowledge model.
Figure 6:
Figure 6:
iHop search interface, showing results retrieved by search for SNF1. Additional entities, MeSH terms, interactions and words are highlighted. (A colour version of this figure is available online at: http://bfg.oxfordjournals.org)
Figure 7:
Figure 7:
MEDIE search results. Relevant sentences from retrieved abstracts are shown, with separate colours for the subject, object and verb. (A colour version of this figure is available online at: http://bfg.oxfordjournals.org)
Figure 8:
Figure 8:
Interface to EVEX database, showing results after searching for the gene ATR.
Figure 9:
Figure 9:
EvidenceFinder interface for anatomical entities.
Figure 10:
Figure 10:
PathText 2 Interface.

References

    1. Hey AJG, Trefethen AE. The data deluge: an e-science perspective. In: Berman F, Fox GC, Hey AJG, editors. Grid Computing: Making the Global Infrastructure a Reality. NJ: Wiley and Sons; 2003. pp. 809–24.
    1. Ananiadou S, McNaught J. Text Mining for Biology and Biomedicine. Boston, MA; London: Artech House; 2006.
    1. Sasaki Y, Tsuruoka Y, McNaught J, et al. How to make the most of NE dictionaries in statistical NER. BMC Bioinformatics. 2008;9(Suppl 11):S5. - PMC - PubMed
    1. Tsuruoka Y, McNaught J, Ananiadou S. Normalizing biomedical terms by minimizing ambiguity and variability. BMC Bioinformatics. 2008;9(Suppl 3):S2. - PMC - PubMed
    1. UniProt Consortium. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 2010;38:D142–8. - PMC - PubMed

Publication types