Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 12;11(5):e0155530.
doi: 10.1371/journal.pone.0155530. eCollection 2016.

Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database

Affiliations

Generating Gene Ontology-Disease Inferences to Explore Mechanisms of Human Disease at the Comparative Toxicogenomics Database

Allan Peter Davis et al. PLoS One. .

Abstract

Strategies for discovering common molecular events among disparate diseases hold promise for improving understanding of disease etiology and expanding treatment options. One technique is to leverage curated datasets found in the public domain. The Comparative Toxicogenomics Database (CTD; http://ctdbase.org/) manually curates chemical-gene, chemical-disease, and gene-disease interactions from the scientific literature. The use of official gene symbols in CTD interactions enables this information to be combined with the Gene Ontology (GO) file from NCBI Gene. By integrating these GO-gene annotations with CTD's gene-disease dataset, we produce 753,000 inferences between 15,700 GO terms and 4,200 diseases, providing opportunities to explore presumptive molecular underpinnings of diseases and identify biological similarities. Through a variety of applications, we demonstrate the utility of this novel resource. As a proof-of-concept, we first analyze known repositioned drugs (e.g., raloxifene and sildenafil) and see that their target diseases have a greater degree of similarity when comparing GO terms vs. genes. Next, a computational analysis predicts seemingly non-intuitive diseases (e.g., stomach ulcers and atherosclerosis) as being similar to bipolar disorder, and these are validated in the literature as reported co-diseases. Additionally, we leverage other CTD content to develop testable hypotheses about thalidomide-gene networks to treat seemingly disparate diseases. Finally, we illustrate how CTD tools can rank a series of drugs as potential candidates for repositioning against B-cell chronic lymphocytic leukemia and predict cisplatin and the small molecule inhibitor JQ1 as lead compounds. The CTD dataset is freely available for users to navigate pathologies within the context of extensive biological processes, molecular functions, and cellular components conferred by GO. This inference set should aid researchers, bioinformaticists, and pharmaceutical drug makers in finding commonalities in disease mechanisms, which in turn could help identify new therapeutics, new indications for existing pharmaceuticals, potential disease comorbidities, and alerts for side effects.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. GO terms inferred to diseases via gene inference networks.
(A) The "Diseases" data-tab on CTD’s webpage for the GO-BP term “G-protein coupled receptor signaling pathway” lists human pathologies inferred to this GO term, including a connection to obesity made by an inference network of 22 genes (red double arrow). (B) A schematic outlines how this GO term is directly annotated to these 22 genes (by external databases) which, in turn, have also been directly associated with obesity independently by CTD biocurators from the literature, allowing the GO term to be inferred (dotted black arrow) to the disease. (C) The files for "GO-Disease-Gene Inference Networks" are freely available from CTD's "Data Downloads" page and can be retrieved in a variety of formats.
Fig 2
Fig 2. Exploring disease mechanisms from a GO perspective.
(A) Using inferred GO-CC data, the number of diseases (red numbers) can be associated with cellular locations, providing an additional level of information for potential, druggable targets. Interactive cell maps can be annotated with these inferences to allow navigation and exploration. The 1,178 diseases mapping to the mitochondrion (boxed arrow) were clustered to MEDIC disease categories (pie chart), and the top four categories are highlighted: nervous system diseases (N), genetic inborn diseases (G), metabolic diseases (M), and cancers (C). (B) The inferred GO-MF terms (blue numbers) for six cancers (red circles) share a subset of 210 molecular functions (blue box), providing core molecular activities informing common mechanisms of cancer.
Fig 3
Fig 3. The potential for shared GO-BP terms vs. shared genes to better inform the repositioning of pharmaceuticals.
Three repositioned therapeutics are shown (blue ovals) with their initial disease target (red) and their subsequent new indication (green), with FDA approval/patent dates listed. The fourth example (orange oval) is purely hypothetical for a presumptive therapeutic that treats both type 2 diabetes and Alzheimer disease, based upon the extensive amount of shared GO-BP terms. Venn diagrams show there is a greater amount of overlap for inferred GO-BP versus directly curated genes for the disease-pairs for each drug, including two therapeutics (thalidomide and sildenafil) for disease-pairs that do not share any genes, but do share inferred GO-BP terms. Venn circles and percentages are color-coded to match targeted diseases in each example; significance of overlaps is defined by p-values.
Fig 4
Fig 4. Discovering comparable diseases via shared inferred GO-BP terms.
There are 2,457 disease-pairs (blue dots) that do not share any genes, but do share inferred GO-BP terms. The percentage of overlap between inferred GO-BP terms for disease A (red, x-axis) is graphed against those of disease B (green, y-axis) to find heterogeneous diseases that are comparable to each other based exclusively on shared biological processes (and no shared genes). A set of 14 disease-pairs with a high amount of shared overlap for both diseases is indicated (orange dotted box). As an example, ulcerative colitis (disease A, red) has no genes in common with coronary artery disease (disease B, green), but the two share 398 inferred GO-BP terms, graphed as 37% for ulcerative colitis and 38% for coronary artery disease. Disease abbreviations: COPD (chronic obstructive pulmonary disease), BCLL (B-cell lymphocytic leukemia), NAFLD (non-alcoholic fatty liver disease). Note: many disease-pairs have the same coordinates (rounded to 2-digits), and thus appear as only a single dot on the graph.
Fig 5
Fig 5. Complementary approaches to discovering comparable diseases.
Bipolar disorder is used as a test case to find comparable diseases (DiseaseComps) via two methods. (A) One of CTD’s current methods uses shared genes to compute a statistical similarity index that ranks comparable diseases, and includes psychotic disorders as the top hit for bipolar disorder (green box). (B) An alternative, complementary approach is to use only shared inferred GO-BP terms to find similar diseases that share biological processes (without sharing genes). Here, substance-induced psychoses (green box) is highly scored and redolent of psychotic disorders found using genes (connecting green arrow). Interestingly, other heterogeneous pathologies (red boxes) predicted to be comparable to bipolar disorder have been verified in the recent literature (see text).
Fig 6
Fig 6. Potential biological processes-of-action for lithium.
The drug lithium is a common therapeutic (T) for bipolar disorder (green arrow), but chronic use in patients has also been reported to cause (M) adverse reactions, such as kidney failure and congenital diaphragmatic hernia (red arrows). Assuming the drug works through modulation of biological processes (gray cloud), we used Venn analysis to compare the number of inferred GO-BP for these three outcomes (colored circles). Currently, there are 231 inferred GO-BP terms (p < 0.001) shared that might represent some of the critical biological processes modulated by lithium treatment; a random selection of some of these shared terms is listed (blue subset).
Fig 7
Fig 7. Leveraging CTD content to build a molecular nexus.
(A) Leprosy and multiple myeloma (MM) are both treated by the drug thalidomide, but the diseases do not currently share any genes in CTD. CTD’s Set Analyzer tool can be used to determine whether the disease-specific gene sets function in a common pathway by: selecting “Genes” (top arrowhead), entering the non-overlapping 43 gene symbols for the two diseases, and then selecting “common gene-gene interactions” (bottom arrowhead). (B) The resulting interaction network can be customized as a graph using the “Pathway View” icon; genes are displayed as circles and their genetic interactions are represented as gray lines. Here, the graph reveals that one leprosy-specific gene (PARK2; red circle) physically interacts with four MM-specific genes (BCL2, BCL2L1, MCL1, and PRAME; green circles with orange borders). Note: for simplicity, only the relevant genes are shown in the interaction network. (C) Leveraging the curated chemical-gene interactions found on CTD’s page for “Thalidomide” (upper right-hand screenshot) reveals that the drug decreases the expression (blue arrows) of three of the genes (BCL2, BCL2L1, and MCL1) that interact with the leprosy-specific PARK2.
Fig 8
Fig 8. Leveraging CTD content to prioritize drugs for repositioning.
B-cell chronic lymphocytic leukemia (BCLL) and neuroblastoma are diseases that currently do not share any known genes in CTD, but do share 320 inferred GO-BP terms, suggesting molecular similarity (see Fig 4). (A) Diseases can be compared using CTD’s VennViewer tool by selecting “Disease” analysis (top arrowhead), inputting the two disease terms, choosing to compare curated chemical associations (middle arrowhead), and adding a filter to retrieve only therapeutic interactions (bottom arrowhead). (B) The resulting Venn diagram identified two chemicals (arsenic trioxide and cyclophosphamide) that each have a curated therapeutic relationship with both diseases, as well as 28 chemicals specific to BCLL (which could potentially be repositioned for neuroblastoma; red box), and 39 chemicals specific to neuroblastoma (which could now be repositioned for BCLL; green box) (C) Arsenic trioxide and cyclophosphamide treat both diseases and both chemicals interact with a set of 277 genes (blue Venn circles), information which can be leveraged to help rank the test drugs. (D) The 39 therapeutic drugs for neuroblastoma with potential repositioning towards BCLL (green names on y-axis) were queried in CTD to see how many of the 277 genes interact with each test drug (x-axis). Four of the 39 test drugs interact with more than 50% of the 277 genes (blue dotted box). (E) Venn diagrams summarize how BCLL and neuroblastoma do not currently share any genes in CTD, but do share 320 inferred GO-BP terms (based upon CTD’s new GO-Disease inference dataset), and that 307 of these 320 GO-BP terms are annotated to the 277-gene set used to rank the test drugs for potential repositioning.

Similar articles

Cited by

References

    1. Salimi N, Vita R. The biocurator: connecting and enhancing scientific data. PLoS Comput Biol. 2006. October 27;2(10):e125 - PMC - PubMed
    1. Burge S, Attwood TK, Bateman A, Berardini TZ, Cherry M, O’Donovan C, et al. Biocurators and biocuration: surveying the 21st century challenges. Database 2012. March 20;2012:bar059 10.1093/database/bar059 - DOI - PMC - PubMed
    1. Davis AP, Wiegers TC, Rosenstein MC, Murphy CG, Mattingly CJ. The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database. Database 2011. September 20;2011:bar034 10.1093/database/bar034 - DOI - PMC - PubMed
    1. Davis AP, Grondin CJ, Lennon-Hopkins K, Saraceni-Richards C, Sciaky D, King BL, et al. The Comparative Toxicogenomics Database’s 10th year anniversary: update 2015. Nucleic Acids Res. 2015. January;43(Database issue):D914–20. 10.1093/nar/gku935 - DOI - PMC - PubMed
    1. Davis AP, Wiegers TC, Rosenstein MC, Mattingly CJ. MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database. Database 2012. March 20;2012:bar065 10.1093/database/bar065 - DOI - PMC - PubMed

Publication types