Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jun 25;18(1):46.
doi: 10.1186/s12911-018-0639-1.

Identification of research hypotheses and new knowledge from scientific literature

Affiliations

Identification of research hypotheses and new knowledge from scientific literature

Matthew Shardlow et al. BMC Med Inform Decis Mak. .

Abstract

Background: Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author's intended knowledge gain) and New Knowledge (an author's findings). The method incorporates various features, including a combination of simple MK dimensions.

Methods: We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated.

Results: We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836).

Conclusion: We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications.

Keywords: Events; Hypothesis; Meta-knowledge; New knowledge; Text mining.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

No ethics approval was required for any element of this study.

Consent for publication

Not Applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
An example of two sentences, one containing events and the other containing one relation. The first sentence shows two events. The first event in the sentence concerns the term ‘activation’ which is a type of positive regulation. The theme of this event is ‘NF-kappaB’, indicating that this protein is being activated. The next event in the sentence is centered around ‘dependent’ which is a type of positive regulation. This event has the cause ‘oxidative stress’ and its theme is the first event in the sentence. The example of a relation between two entities is, in contrast to the event, clearly much more simple. The relation indicates that NPTN is related to Schizophrenia in a relation that can be categorised as ‘Target-Disorder’
Fig. 2
Fig. 2
The GENIA-MK annotation scheme. There are five Meta-Knowledge dimensions introduced by Thompson et al. as well as two further hyperdimensions

References

    1. Jiawen L, Dongsheng L, Zhijian T. The expression of interleukin-17, interferon-gamma, and macrophage inflammatory protein-3 alpha mRNA in patients with psoriasis vulgaris. J Huazhong University Sci Technol [Med Sci] 2004;24(3):294–6. doi: 10.1007/BF02832018. - DOI - PubMed
    1. Scharffetter-Kochanek K, Singh K, Tasdogan A, Wlaschek M, Gatzka M, Hainzl A, Peters T. Reduction of CD18 promotes expansion of inflammatory gd T cells collaborating with CD4 T cells in chronic murine psoriasiform dermatitis. J Immunol. 2013;191:5477–88. doi: 10.4049/jimmunol.1300976. - DOI - PubMed
    1. Zerva C, Batista-Navarro R, Day P, Ananiadou S. Using uncertainty to link and rank evidence from biomedical literature for model curation. Bioinformatics. btx466. 10.1093/bioinformatics/btx466. - PMC - PubMed
    1. Stenetorp P, Pyysalo S, Topić G, Ohta T, Ananiadou S, Tsujii J. BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics: 2012. p. 102–107.
    1. Agarwal S, Yu H, Kohane I. BioNØT: A searchable database of biomedical negated sentences. BMC Bioinformatics. 2011;12(1):420. doi: 10.1186/1471-2105-12-420. - DOI - PMC - PubMed

Publication types

LinkOut - more resources