Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Aug;138(4):1914-25.
doi: 10.1104/pp.105.060863.

Dragon Plant Biology Explorer. A text-mining tool for integrating associations between genetic and biochemical entities with genome annotation and biochemical terms lists

Affiliations

Dragon Plant Biology Explorer. A text-mining tool for integrating associations between genetic and biochemical entities with genome annotation and biochemical terms lists

Vladimir B Bajic et al. Plant Physiol. 2005 Aug.

Abstract

We introduce a tool for text mining, Dragon Plant Biology Explorer (DPBE) that integrates information on Arabidopsis (Arabidopsis thaliana) genes with their functions, based on gene ontologies and biochemical entity vocabularies, and presents the associations as interactive networks. The associations are based on (1) user-provided PubMed abstracts; (2) a list of Arabidopsis genes compiled by The Arabidopsis Information Resource; (3) user-defined combinations of four vocabulary lists based on the ones developed by the general, plant, and Arabidopsis GO consortia; and (4) three lists developed here based on metabolic pathways, enzymes, and metabolites derived from AraCyc, BRENDA, and other metabolism databases. We demonstrate how various combinations can be applied to fields of (1) gene function and gene interaction analyses, (2) plant development, (3) biochemistry and metabolism, and (4) pharmacology of bioactive compounds. Furthermore, we show the suitability of DPBE for systems approaches by integration with "omics" platform outputs. Using a list of abiotic stress-related genes identified by microarray experiments, we show how this tool can be used to rapidly build an information base on the previously reported relationships. This tool complements the existing biological resources for systems biology by identifying potentially novel associations using text analysis between cellular entities based on genome annotation terms. Thus, it allows researchers to efficiently summarize existing information for a group of genes or pathways, so as to make better informed choices for designing validation experiments. Last, DPBE can be helpful for beginning researchers and graduate students to summarize vast information in an unfamiliar area. DPBE is freely available for academic and nonprofit users at http://research.i2r.a-star.edu.sg/DRAGON/ME2/.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Increase in the indexed PubMed entries for scientific reports related to selected domains of plants, including Arabidopsis. Keywords used to search the abstracts in the databases were the same as given in the legend.
Figure 2.
Figure 2.
A section of the input module of the DPBE from its home page (http://research.i2r.a-star.edu.sg/DRAGON/ME2/). There are five input Explorer modules of DPBE: (1) Gene/Protein Function Explorer, (2) Plant Development Explorer, (3) Metabolome Explorer, (4) Natural Products Pharmacology Explorer, and (5) Customized Plant Biology Explorer. For each module, a selection of vocabularies is available. A part not shown here includes instructions for “omics” researchers.
Figure 3.
Figure 3.
Overview of the interactive Output Modules of DPBE. A, Three types of tabular outputs. Two tabular outputs that are hyperlinked to the main table are shown with horizontal arrows. B, A network of associations that is linked to the main table is shown with a vertical arrow. C, The color and shape coding used to highlight the vocabulary elements in the abstracts. D, The nodes of the association networks (shown in B) are linked to the PubMed abstracts that are stored locally and displayed with the highlighted text. This allows users to rapidly screen the relevant literature. The interactivity can be seen at the DPBE Web site under “Examples” (http://research.i2r.a-star.edu.sg/DRAGON/ME2/).
Figure 4.
Figure 4.
The systems approach module of the DPBE system. Analysis of an association network generated from DPBE, based on 3,221 PubMed abstracts related to 22 genes identified from a transcript profiling experiment of the stress regulator DREB1A (Maruyama et al., 2004). A, Part of network 3 with setting of one link per node is boxed to highlight the RD29A associations. Note the large number of genes belonging to different biological pathways that show association with the common stress-responsive gene RD29A. The portions of the network from above and below the RD29A node are connected here with dashed arrows. The full network is presented in Supplemental Figure 5. B, A biological network of gene expressions in relation to RD29A expression is shown here. Color-coded entities in the linked abstracts of the DPBE association network were put into the five bins, represented by the ovals. Their relationships to each other and with RD29A (such as up/down-regulation, coregulated) were then superimposed. The numbers in the boxes refer to the publications that are provided in the “Examples” section at the DPBE Web site. The asterisks mark the genes that were present in the list of 22 input gene names. Note the additional gene associations identified by the DPBE system.
Figure 4.
Figure 4.
The systems approach module of the DPBE system. Analysis of an association network generated from DPBE, based on 3,221 PubMed abstracts related to 22 genes identified from a transcript profiling experiment of the stress regulator DREB1A (Maruyama et al., 2004). A, Part of network 3 with setting of one link per node is boxed to highlight the RD29A associations. Note the large number of genes belonging to different biological pathways that show association with the common stress-responsive gene RD29A. The portions of the network from above and below the RD29A node are connected here with dashed arrows. The full network is presented in Supplemental Figure 5. B, A biological network of gene expressions in relation to RD29A expression is shown here. Color-coded entities in the linked abstracts of the DPBE association network were put into the five bins, represented by the ovals. Their relationships to each other and with RD29A (such as up/down-regulation, coregulated) were then superimposed. The numbers in the boxes refer to the publications that are provided in the “Examples” section at the DPBE Web site. The asterisks mark the genes that were present in the list of 22 input gene names. Note the additional gene associations identified by the DPBE system.

Similar articles

Cited by

References

    1. Andrade MA, Bork P (2000) Automated extraction of information in molecular biology. FEBS Lett 476: 12–17 - PubMed
    1. Andrade MA, Valencia A (1998) Automatic extraction of keywords from scientific knowledge: application to the knowledge domain of protein families. Bioinformatics 14: 600–607 - PubMed
    1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 - PubMed
    1. Becker KG, Hosack DA, Dennis G Jr, Lempicki RA, Bright TJ, Cheadle C, Engel J (2003) PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics 4: 61. - PMC - PubMed
    1. Berardini ZT, Mundodi S, Reiser L, Huala E, Hernandez MG, Zhang P, Mueller LA, Yoon J, Doyle A, Lander G, et al (2004) Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol 135: 745–755 - PMC - PubMed

LinkOut - more resources