Text-mining and information-retrieval services for molecular biology

Martin Krallinger¹, Alfonso Valencia

Affiliations

PMID: 15998455
PMCID: PMC1175978
DOI: 10.1186/gb-2005-6-7-224

Text-mining and information-retrieval services for molecular biology

Martin Krallinger et al. Genome Biol. 2005.

. 2005;6(7):224.

doi: 10.1186/gb-2005-6-7-224. Epub 2005 Jun 28.

Authors

Martin Krallinger¹, Alfonso Valencia

Affiliation

¹ National Center of Biotechnology, CNB-CSIC, Cantoblanco, E-28049 Madrid, Spain. martink@cnb.uam.es

PMID: 15998455
PMCID: PMC1175978
DOI: 10.1186/gb-2005-6-7-224

Abstract

Text-mining in molecular biology -- defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents -- has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators.

PubMed Disclaimer

Figures

**Figure 1**
An overview of biological natural language processing (BioNLP) and text-mining applications for biology. The major topics are represented by the inner circle of seven approaches, and the corresponding applications are given in the outer layers of boxes. Most of the tools are available online or for download. Some applications could be classified into multiple topics; they are shown here associated with one of their most significant topics. For instance, most of the text-mining applications (that is, the applications that are not simply for article retrieval) have integrated modules for named entity recognition (NER), and selective dissemination of information (SDI) services often use automated Boolean queries for article retrieval. References and URLs for each application, where available, are given in Table 1.

**Figure 2**
Basic steps in the use of the iHOP text-mining tool [40], illustrated with screenshots [42]. For a given query (for example, the protein symbols **(a)** Wnt-1 or **(b)** LEF-1), all the sentences mentioning the name are retrieved from PubMed. These sentences also contain mentions of other proteins, which are highlighted and which might show associations with the query protein (see the magnified area in (b)). Functional terms (such as 'target' and 'complexes' and interaction verbs (such as 'activated' and 'stabilizes') are in bold. **(c)** By clicking on the 'Gene model' link in the left panel in (a,b), interaction networks of proteins that co-occur in sentences with the query proteins can be displayed.

See this image and copyright information in PMC

References

1. Altavista http://www.altavista.com
1. Google http://www.google.com
1. Schuler G, Epstein J, Ohkawa H, Kans J. Entrez: molecular biology database and retrieval system. Methods Enzymol. 1996;266:141–162. - PubMed
1. Entrez PubMed http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
1. Wheeler D, Church D, Federhen S, Lash A, Madden T, Pontius J, Schuler G, Schriml L, Sequeira E, Tatusova T, Wagner L. Database resources of the National Center for Biotechnology. Nucleic Acids Res. 2003;31:28–33. doi: 10.1093/nar/gkg033. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Text-mining and information-retrieval services for molecular biology

Affiliation

Text-mining and information-retrieval services for molecular biology

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources