Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul;42(Web Server issue):W422-9.
doi: 10.1093/nar/gku432. Epub 2014 May 16.

Alkemio: association of chemicals with biomedical topics by text and data mining

Affiliations

Alkemio: association of chemicals with biomedical topics by text and data mining

José A Gijón-Correas et al. Nucleic Acids Res. 2014 Jul.

Abstract

The PubMed® database of biomedical citations allows the retrieval of scientific articles studying the function of chemicals in biology and medicine. Mining millions of available citations to search reported associations between chemicals and topics of interest would require substantial human time. We have implemented the Alkemio text mining web tool and SOAP web service to help in this task. The tool uses biomedical articles discussing chemicals (including drugs), predicts their relatedness to the query topic with a naïve Bayesian classifier and ranks all chemicals by P-values computed from random simulations. Benchmarks on seven human pathways showed good retrieval performance (areas under the receiver operating characteristic curves ranged from 73.6 to 94.5%). Comparison with existing tools to retrieve chemicals associated to eight diseases showed the higher precision and recall of Alkemio when considering the top 10 candidate chemicals. Alkemio is a high performing web tool ranking chemicals for any biomedical topics and it is free to non-commercial users.

Availability: http://cbdm.mdc-berlin.de/∼medlineranker/cms/alkemio.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Alkemio's output. As example, Alkemio was queried to retrieve AD chemicals on 31 March 2014. The topic was queried as a MeSH term (Alzheimer Disease), citations from the last 3 years were used, the P-value cutoff for abstract selection equaled 0.01 and the FDR cutoff equaled 0.001. The main result shown at the top of this figure is a table of ranked candidate chemicals with registry numbers (ID), names as MeSH terms, number of related PubMed citations (PMIDs), number of PubMed citations classified as relevant to AD (hits), FDR and links to the 10 best PubMed citations (top 10 abstracts). PubMed citation links are displayed by level of confidence: highest level of confidence (manual validation; red heart symbols), high precision (90% precision from random simulations; red diamonds), good precision (70% precision from random simulations; black spades) and others passing the cutoff (black clubs). By clicking on a PubMed citation link, detailed information will be displayed in a new window (bottom right corner), including abstract, MeSH terms, discriminative words (brown color gradient) and the target chemical name (rose highlighting). The output also contains the list of discriminative words used by the document classifier (left-hand side), and a download section to retrieve the data as text tables (bottom left corner).
Figure 2.
Figure 2.
Molecular pathway benchmark. Alkemio was queried to retrieve chemicals related to seven molecular pathways. Pathways were selected from the WikiPathways database if associated with >10 chemicals. Due to the low number of known chemicals in these pathways (from 12 to 14) and to the high number of candidates returned by Alkemio (between 545 and 2431), we evaluated the retrieval performance using a random sampling strategy with the QiSampler tool (9). The figure shows ROC curves (blue lines) and control curves from random simulations (dashed lines) produced by the QiSampler tool when selecting 1000 repetitions and 50% of sampling rate. Legends show area under the curve.
Figure 3.
Figure 3.
Comparison with existing tools. Precision in the top 10 candidate chemicals retrieved by Alkemio, FACTA and PolySearch according to manual associations between chemicals and diseases from the CTD database. As the CTD data are not comprehensive, many true positives are not automatically identified and the observed precision underestimates the real precision.

Similar articles

Cited by

References

    1. NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2014;42:D7–D17. - PMC - PubMed
    1. Frijters R., Heupers B., van Beek P., Bouwhuis M., van Schaik R., de Vlieg J., Polman J., Alkema W. CoPub: a literature-based keyword enrichment tool for microarray data analysis. Nucleic Acids Res. 2008;36:W406–W410. - PMC - PubMed
    1. Cheng D., Knox C., Young N., Stothard P., Damaraju S., Wishart D.S. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res. 2008;36:W399–W405. - PMC - PubMed
    1. Rebholz-Schuhmann D., Kirsch H., Arregui M., Gaudan S., Riethoven M., Stoehr P. EBIMed–text crunching to gather facts for proteins from Medline. Bioinformatics. 2007;23:e237–e244. - PubMed
    1. Fontaine J.F., Priller F., Barbosa-Silva A., Andrade-Navarro M.A. Genie: literature-based gene prioritization at multi genomic scale. Nucleic Acids Res. 2011;39:W455–W461. - PMC - PubMed

Publication types

Substances