Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Bin Chen¹, Xiao Dong, Dazhi Jiao, Huijun Wang, Qian Zhu, Ying Ding, David J Wild

Affiliations

PMID: 20478034
PMCID: PMC2881087
DOI: 10.1186/1471-2105-11-255

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Bin Chen et al. BMC Bioinformatics. 2010.

. 2010 May 17:11:255.

doi: 10.1186/1471-2105-11-255.

Authors

Bin Chen¹, Xiao Dong, Dazhi Jiao, Huijun Wang, Qian Zhu, Ying Ding, David J Wild

Affiliation

¹ School of Informatics and Computing, Indiana University, Bloomington, IN, USA.

PMID: 20478034
PMCID: PMC2881087
DOI: 10.1186/1471-2105-11-255

Abstract

Background: Recently there has been an explosion of new data sources about genes, proteins, genetic variations, chemical compounds, diseases and drugs. Integration of these data sources and the identification of patterns that go across them is of critical interest. Initiatives such as Bio2RDF and LODD have tackled the problem of linking biological data and drug data respectively using RDF. Thus far, the inclusion of chemogenomic and systems chemical biology information that crosses the domains of chemistry and biology has been very limited

Results: We have created a single repository called Chem2Bio2RDF by aggregating data from multiple chemogenomics repositories that is cross-linked into Bio2RDF and LODD. We have also created a linked-path generation tool to facilitate SPARQL query generation, and have created extended SPARQL functions to address specific chemical/biological search needs. We demonstrate the utility of Chem2Bio2RDF in investigating polypharmacology, identification of potential multiple pathway inhibitors, and the association of pathways with adverse drug reactions.

Conclusions: We have created a new semantic systems chemical biology resource, and have demonstrated its potential usefulness in specific examples of polypharmacology, multiple pathway inhibition and adverse drug reaction--pathway mapping. We have also demonstrated the usefulness of extending SPARQL with cheminformatics and bioinformatics functionality.

PubMed Disclaimer

Figures

**Figure 1**
**Chem2Bio2RDF datasets**. Nodes represent data sources. Two nodes are linked if the data of one source is directed to the data of another source. The node is shaped and colored by its type, which is organized into six categories. Some databases map to multiple sources.

**Figure 2**
**Chem2Bio2RDF querying architecture**. Chem2Bio2RDF is linked to Bio2RDF, LODD and other RDF resources. LPG refers to prototype methods used for automatically generating links between two given objects and automated generation of SPARQL queries.

**Figure 3**
**Prototype linked path generation**. A prototype of a tool that allows users to select origin and terminal data sources. The tool will generate all the possible paths between the two data sources, will allow the user to select individual paths, and will then convert these into SPARQL queries.

**Figure 4**
**Class links for polypharmacology**. Includes the classes: Bioassay, Drug Target, Pathway, Protein-Protein Interaction, and Disease. Some classes include more than one data source. Two nodes in different classes are linked through two paths. For instance, drug X is linked to compound Y if targets A and B of drug X are linked to assays A and B of compound Y via UNIPROT ID.

**Figure 5**
**Graphical representation of the SPARQL query for Case Study 1**. PubChem compounds (e.g. CID 5754) are identified that are active in bioassays that are associated with protein targets, which are associated with genes (via UNIPROT), which are identified as those with which Dexamethasone interacts (via DrugBank). The resultant compounds are thus those that have a similar activity profile to Dexamethasone.

**Figure 6**
**Illustration of polypharmacology in pathways**. The compound is active against two proteins that are located in the two branches of the pathway that is associated with one disease. Targeting either node C or node D is not able to block the whole pathway.

**Figure 7**
**Graphical representation of the SPARQL query for Case Study 2**. PubChem compounds (e.g. CID 573747) are identified that are active in bioassays that are associated with protein targets, which are associated with genes (via UNIPROT) which are identified as being part of the MAPK signalling pathway (via KEGG). We thus identify compounds which have multiple paths, and thus which interact with multiple targets in this protein.

**Figure 8**
**Associating pathways with hepatotoxic effects**. The drugs that are associated with hepatotoxicity-related side effects are associated with their targets using DrugBank. The targets are associated with pathways using KEGG to establish association chains between pathways and side-effects.

See this image and copyright information in PMC

References

1. Wild DJ. Mining large heterogeneous datasets in drug discovery. Expert Opinion on Drug Discovery. 2009;4(10):995–1004. doi: 10.1517/17460440903233738. - DOI - PubMed
1. Slater T, Bouton C, Huang ES. Beyond data integration. Drug Discovery Today. 2008;13(13-14):584–9. doi: 10.1016/j.drudis.2008.01.008. - DOI - PubMed
1. Chen B, Wild DJ, Guha R. PubChem as a Source of Polypharmacology. J Chem Inf and Model. 2009;49(9):2044–2055. doi: 10.1021/ci9001876. - DOI - PubMed
1. Hopkins AL. Network Pharmacology: The Next Paradigm in Drug Discovery. Nat. Chem. Biol. 2008;4:682–690. doi: 10.1038/nchembio.118. - DOI - PubMed
1. Scheiber J, Chen B, Milik M, Sukuru SC, Bender A, Mikhailov D, Whitebread S, Hamon J, Azzaoui K, Urban L, Glick M, Davies JW, Jenkins JL. Gaining insight into off-target mediated effects of drug candidates with a comprehensive systems chemical biology analysis. J Chem Inf Model. 2009;49(2):308–17. doi: 10.1021/ci800344p. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Affiliation

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Authors

Affiliation

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources