Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005;6(7):224.
doi: 10.1186/gb-2005-6-7-224. Epub 2005 Jun 28.

Text-mining and information-retrieval services for molecular biology

Affiliations

Text-mining and information-retrieval services for molecular biology

Martin Krallinger et al. Genome Biol. 2005.

Abstract

Text-mining in molecular biology -- defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents -- has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An overview of biological natural language processing (BioNLP) and text-mining applications for biology. The major topics are represented by the inner circle of seven approaches, and the corresponding applications are given in the outer layers of boxes. Most of the tools are available online or for download. Some applications could be classified into multiple topics; they are shown here associated with one of their most significant topics. For instance, most of the text-mining applications (that is, the applications that are not simply for article retrieval) have integrated modules for named entity recognition (NER), and selective dissemination of information (SDI) services often use automated Boolean queries for article retrieval. References and URLs for each application, where available, are given in Table 1.
Figure 2
Figure 2
Basic steps in the use of the iHOP text-mining tool [40], illustrated with screenshots [42]. For a given query (for example, the protein symbols (a) Wnt-1 or (b) LEF-1), all the sentences mentioning the name are retrieved from PubMed. These sentences also contain mentions of other proteins, which are highlighted and which might show associations with the query protein (see the magnified area in (b)). Functional terms (such as 'target' and 'complexes' and interaction verbs (such as 'activated' and 'stabilizes') are in bold. (c) By clicking on the 'Gene model' link in the left panel in (a,b), interaction networks of proteins that co-occur in sentences with the query proteins can be displayed.

References

    1. Altavista http://www.altavista.com
    1. Google http://www.google.com
    1. Schuler G, Epstein J, Ohkawa H, Kans J. Entrez: molecular biology database and retrieval system. Methods Enzymol. 1996;266:141–162. - PubMed
    1. Entrez PubMed http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
    1. Wheeler D, Church D, Federhen S, Lash A, Madden T, Pontius J, Schuler G, Schriml L, Sequeira E, Tatusova T, Wagner L. Database resources of the National Center for Biotechnology. Nucleic Acids Res. 2003;31:28–33. doi: 10.1093/nar/gkg033. - DOI - PMC - PubMed

Publication types