Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 17:11:255.
doi: 10.1186/1471-2105-11-255.

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Affiliations

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Bin Chen et al. BMC Bioinformatics. .

Abstract

Background: Recently there has been an explosion of new data sources about genes, proteins, genetic variations, chemical compounds, diseases and drugs. Integration of these data sources and the identification of patterns that go across them is of critical interest. Initiatives such as Bio2RDF and LODD have tackled the problem of linking biological data and drug data respectively using RDF. Thus far, the inclusion of chemogenomic and systems chemical biology information that crosses the domains of chemistry and biology has been very limited

Results: We have created a single repository called Chem2Bio2RDF by aggregating data from multiple chemogenomics repositories that is cross-linked into Bio2RDF and LODD. We have also created a linked-path generation tool to facilitate SPARQL query generation, and have created extended SPARQL functions to address specific chemical/biological search needs. We demonstrate the utility of Chem2Bio2RDF in investigating polypharmacology, identification of potential multiple pathway inhibitors, and the association of pathways with adverse drug reactions.

Conclusions: We have created a new semantic systems chemical biology resource, and have demonstrated its potential usefulness in specific examples of polypharmacology, multiple pathway inhibition and adverse drug reaction--pathway mapping. We have also demonstrated the usefulness of extending SPARQL with cheminformatics and bioinformatics functionality.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Chem2Bio2RDF datasets. Nodes represent data sources. Two nodes are linked if the data of one source is directed to the data of another source. The node is shaped and colored by its type, which is organized into six categories. Some databases map to multiple sources.
Figure 2
Figure 2
Chem2Bio2RDF querying architecture. Chem2Bio2RDF is linked to Bio2RDF, LODD and other RDF resources. LPG refers to prototype methods used for automatically generating links between two given objects and automated generation of SPARQL queries.
Figure 3
Figure 3
Prototype linked path generation. A prototype of a tool that allows users to select origin and terminal data sources. The tool will generate all the possible paths between the two data sources, will allow the user to select individual paths, and will then convert these into SPARQL queries.
Figure 4
Figure 4
Class links for polypharmacology. Includes the classes: Bioassay, Drug Target, Pathway, Protein-Protein Interaction, and Disease. Some classes include more than one data source. Two nodes in different classes are linked through two paths. For instance, drug X is linked to compound Y if targets A and B of drug X are linked to assays A and B of compound Y via UNIPROT ID.
Figure 5
Figure 5
Graphical representation of the SPARQL query for Case Study 1. PubChem compounds (e.g. CID 5754) are identified that are active in bioassays that are associated with protein targets, which are associated with genes (via UNIPROT), which are identified as those with which Dexamethasone interacts (via DrugBank). The resultant compounds are thus those that have a similar activity profile to Dexamethasone.
Figure 6
Figure 6
Illustration of polypharmacology in pathways. The compound is active against two proteins that are located in the two branches of the pathway that is associated with one disease. Targeting either node C or node D is not able to block the whole pathway.
Figure 7
Figure 7
Graphical representation of the SPARQL query for Case Study 2. PubChem compounds (e.g. CID 573747) are identified that are active in bioassays that are associated with protein targets, which are associated with genes (via UNIPROT) which are identified as being part of the MAPK signalling pathway (via KEGG). We thus identify compounds which have multiple paths, and thus which interact with multiple targets in this protein.
Figure 8
Figure 8
Associating pathways with hepatotoxic effects. The drugs that are associated with hepatotoxicity-related side effects are associated with their targets using DrugBank. The targets are associated with pathways using KEGG to establish association chains between pathways and side-effects.

Similar articles

Cited by

References

    1. Wild DJ. Mining large heterogeneous datasets in drug discovery. Expert Opinion on Drug Discovery. 2009;4(10):995–1004. doi: 10.1517/17460440903233738. - DOI - PubMed
    1. Slater T, Bouton C, Huang ES. Beyond data integration. Drug Discovery Today. 2008;13(13-14):584–9. doi: 10.1016/j.drudis.2008.01.008. - DOI - PubMed
    1. Chen B, Wild DJ, Guha R. PubChem as a Source of Polypharmacology. J Chem Inf and Model. 2009;49(9):2044–2055. doi: 10.1021/ci9001876. - DOI - PubMed
    1. Hopkins AL. Network Pharmacology: The Next Paradigm in Drug Discovery. Nat. Chem. Biol. 2008;4:682–690. doi: 10.1038/nchembio.118. - DOI - PubMed
    1. Scheiber J, Chen B, Milik M, Sukuru SC, Bender A, Mikhailov D, Whitebread S, Hamon J, Azzaoui K, Urban L, Glick M, Davies JW, Jenkins JL. Gaining insight into off-target mediated effects of drug candidates with a comprehensive systems chemical biology analysis. J Chem Inf Model. 2009;49(2):308–17. doi: 10.1021/ci800344p. - DOI - PubMed

LinkOut - more resources