Complex event extraction at PubMed scale
- PMID: 20529932
- PMCID: PMC2881365
- DOI: 10.1093/bioinformatics/btq180
Complex event extraction at PubMed scale
Abstract
Motivation: There has recently been a notable shift in biomedical information extraction (IE) from relation models toward the more expressive event model, facilitated by the maturation of basic tools for biomedical text analysis and the availability of manually annotated resources. The event model allows detailed representation of complex natural language statements and can support a number of advanced text mining applications ranging from semantic search to pathway extraction. A recent collaborative evaluation demonstrated the potential of event extraction systems, yet there have so far been no studies of the generalization ability of the systems nor the feasibility of large-scale extraction.
Results: This study considers event-based IE at PubMed scale. We introduce a system combining publicly available, state-of-the-art methods for domain parsing, named entity recognition and event extraction, and test the system on a representative 1% sample of all PubMed citations. We present the first evaluation of the generalization performance of event extraction systems to this scale and show that despite its computational complexity, event extraction from the entire PubMed is feasible. We further illustrate the value of the extraction approach through a number of analyses of the extracted information.
Availability: The event detection system and extracted data are open source licensed and available at http://bionlp.utu.fi/.
Figures




Similar articles
-
Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.J Biomed Semantics. 2016 May 11;7:27. doi: 10.1186/s13326-016-0070-4. eCollection 2016. J Biomed Semantics. 2016. PMID: 27175227 Free PMC article.
-
Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.J Biomed Semantics. 2016 Apr 27;7:22. doi: 10.1186/s13326-016-0059-z. eCollection 2016. J Biomed Semantics. 2016. PMID: 27127603 Free PMC article.
-
Biomedical event extraction based on GRU integrating attention mechanism.BMC Bioinformatics. 2018 Aug 13;19(Suppl 9):285. doi: 10.1186/s12859-018-2275-2. BMC Bioinformatics. 2018. PMID: 30367569 Free PMC article.
-
Enriching contextualized language model from knowledge graph for biomedical information extraction.Brief Bioinform. 2021 May 20;22(3):bbaa110. doi: 10.1093/bib/bbaa110. Brief Bioinform. 2021. PMID: 32591802 Review.
-
Event-based text mining for biology and functional genomics.Brief Funct Genomics. 2015 May;14(3):213-30. doi: 10.1093/bfgp/elu015. Epub 2014 Jun 6. Brief Funct Genomics. 2015. PMID: 24907365 Free PMC article. Review.
Cited by
-
The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011.BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S1. doi: 10.1186/1471-2105-13-S11-S1. BMC Bioinformatics. 2012. PMID: 22759455 Free PMC article.
-
Integrated bio-entity network: a system for biological knowledge discovery.PLoS One. 2011;6(6):e21474. doi: 10.1371/journal.pone.0021474. Epub 2011 Jun 27. PLoS One. 2011. PMID: 21738677 Free PMC article.
-
Dependency parsing of biomedical text with BERT.BMC Bioinformatics. 2020 Dec 29;21(Suppl 23):580. doi: 10.1186/s12859-020-03905-8. BMC Bioinformatics. 2020. PMID: 33372589 Free PMC article.
-
Extracting semantically enriched events from biomedical literature.BMC Bioinformatics. 2012 May 23;13:108. doi: 10.1186/1471-2105-13-108. BMC Bioinformatics. 2012. PMID: 22621266 Free PMC article.
-
News sensitive stock market prediction: literature review and suggestions.PeerJ Comput Sci. 2021 May 4;7:e490. doi: 10.7717/peerj-cs.490. eCollection 2021. PeerJ Comput Sci. 2021. PMID: 34013029 Free PMC article.
References
-
- Benton N. Scope expands for PubMed® and MEDLINE®. NLM Technical Bulletin. 1999;311
-
- Björne J, et al. Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task. New York, NY, USA: Association for Computational Linguistics; 2009. Extracting complex biological events with rich graph-based feature sets; pp. 10–18.
-
- Chapman WW, Cohen KB. Current issues in biomedical text mining and natural language processing. J. Biomed. Inform. 2009;42:757–759. - PubMed
-
- Charniak E, Johnson M. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05). New York, NY, USA: Association for Computational Linguistics; 2005. Coarse-to-fine n-best parsing and maxent discriminative reranking; pp. 173–180.