Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Nov 24;7 Suppl 3(Suppl 3):S3.
doi: 10.1186/1471-2105-7-S3-S3.

An environment for relation mining over richly annotated corpora: the case of GENIA

Affiliations

An environment for relation mining over richly annotated corpora: the case of GENIA

Fabio Rinaldi et al. BMC Bioinformatics. .

Abstract

Background: The biomedical domain is witnessing a rapid growth of the amount of published scientific results, which makes it increasingly difficult to filter the core information. There is a real need for support tools that 'digest' the published results and extract the most important information.

Results: We describe and evaluate an environment supporting the extraction of domain-specific relations, such as protein-protein interactions, from a richly-annotated corpus. We use full, deep-linguistic parsing and manually created, versatile patterns, expressing a large set of syntactic alternations, plus semantic ontology information.

Conclusion: The experiments show that our approach described is capable of delivering high-precision results, while maintaining sufficient levels of recall. The high level of abstraction of the rules used by the system, which are considerably more powerful and versatile than finite-state approaches, allows speedy interactive development and validation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example of Dependency Tree. Tree of dependencies for a GENIA sentence, along with other linguistic annotations. Notice the additional deep-linguistic "control" subject dependency between token 7 and 4.
Figure 2
Figure 2
Sample Output. Sample output for the 'activate' relation.

References

    1. Jensen LJ, Saric J, Bork P. Literature mining for the biologist: from information retrieval to biological discovery. Nature Reviews Genetics. 2006;7:119–129. doi: 10.1038/nrg1768. - DOI - PubMed
    1. Rinaldi F, Schneider G, Kaljurand K, Hess M, Andronis C, Konstanti O, Persidis A. Mining of Functional Relations between Genes and Proteins over Biomedical Scientific Literature using a Deep-Linguistic Approach. Journal of Artificial Intelligence in Medicine. 2006. - PubMed
    1. Saric J, Jensen LJ, Ouzounova R, Rojas I, Bork P. Extraction of regulatory gene/protein networks from Medline. Bioinformatics. 2005;22:645–650. doi: 10.1093/bioinformatics/bti597. - DOI - PubMed
    1. Daraselia N, Egorov S, Yazhuk A, Novichkova S, Yuryev A, Mazo I. Extracting Protein Function Information from MEDLINE Using a Full-Sentence Parser. In: Scheffer T, editor. Second European Workshop on Data Mining and Text Mining for Bioinformatics. Pisa, Italy: ECML/PKDD; 2004. pp. 11–18.
    1. Kim J, Ohta T, Tateisi Y, Tsujii J. GENIA Corpus – a Semantically Annotated Corpus for Bio-Textmining. Bioinformatics. 2003;19:180–182. doi: 10.1093/bioinformatics/btg1023. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/19/suppl_1... - DOI - PubMed

Publication types

LinkOut - more resources