Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles
- PMID: 19900574
- DOI: 10.1016/j.jbi.2009.11.001
Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles
Abstract
Massive increases in electronically available text have spurred a variety of natural language processing methods to automatically identify relationships from text; however, existing annotated collections comprise only bioinformatics (gene-protein) or clinical informatics (treatment-disease) relationships. This paper introduces the Claim Framework that reflects how authors across biomedical spectrum communicate findings in empirical studies. The Framework captures different levels of evidence by differentiating between explicit and implicit claims, and by capturing under-specified claims such as correlations, comparisons, and observations. The results from 29 full-text articles show that authors report fewer than 7.84% of scientific claims in an abstract, thus revealing the urgent need for text mining systems to consider the full-text of an article rather than just the abstract. The results also show that authors typically report explicit claims (77.12%) rather than an observations (9.23%), correlations (5.39%), comparisons (5.11%) or implicit claims (2.7%). Informed by the initial manual annotations, we introduce an automated approach that uses syntax and semantics to identify explicit claims automatically and measure the degree to which each feature contributes to the overall precision and recall. Results show that a combination of semantics and syntax is required to achieve the best system performance.
2009 Elsevier Inc. All rights reserved.
Similar articles
-
Concept annotation in the CRAFT corpus.BMC Bioinformatics. 2012 Jul 9;13:161. doi: 10.1186/1471-2105-13-161. BMC Bioinformatics. 2012. PMID: 22776079 Free PMC article.
-
Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.Bioinformatics. 2009 Dec 1;25(23):3174-80. doi: 10.1093/bioinformatics/btp548. Epub 2009 Sep 25. Bioinformatics. 2009. PMID: 19783830 Free PMC article.
-
Discovering gene annotations in biomedical text databases.BMC Bioinformatics. 2008 Mar 6;9:143. doi: 10.1186/1471-2105-9-143. BMC Bioinformatics. 2008. PMID: 18325104 Free PMC article.
-
Two biomedical sublanguages: a description based on the theories of Zellig Harris.J Biomed Inform. 2002 Aug;35(4):222-35. doi: 10.1016/s1532-0464(03)00012-1. J Biomed Inform. 2002. PMID: 12755517 Review.
-
Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?Brief Bioinform. 2008 Nov;9(6):466-78. doi: 10.1093/bib/bbn043. Epub 2008 Dec 6. Brief Bioinform. 2008. PMID: 19060303 Review.
Cited by
-
Does the use of unusual combinations of datasets contribute to greater scientific impact?Proc Natl Acad Sci U S A. 2024 Oct 8;121(41):e2402802121. doi: 10.1073/pnas.2402802121. Epub 2024 Oct 2. Proc Natl Acad Sci U S A. 2024. PMID: 39356667 Free PMC article.
-
BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events.Bioinformatics. 2012 Aug 15;28(16):2154-61. doi: 10.1093/bioinformatics/bts332. Epub 2012 Jun 17. Bioinformatics. 2012. PMID: 22711795 Free PMC article.
-
A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.PLoS Comput Biol. 2018 Feb 15;14(2):e1005962. doi: 10.1371/journal.pcbi.1005962. eCollection 2018 Feb. PLoS Comput Biol. 2018. PMID: 29447159 Free PMC article.
-
Designing and evaluating a clustering system for organizing and integrating patient drug outcomes in personal health messages.AMIA Annu Symp Proc. 2012;2012:417-26. Epub 2012 Nov 3. AMIA Annu Symp Proc. 2012. PMID: 23304312 Free PMC article.
-
Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE.Database (Oxford). 2012 Jun 8;2012:bas026. doi: 10.1093/database/bas026. Print 2012. Database (Oxford). 2012. PMID: 22685160 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources