Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul;5(7):e1000450.
doi: 10.1371/journal.pcbi.1000450. Epub 2009 Jul 31.

Building disease-specific drug-protein connectivity maps from molecular interaction networks and PubMed abstracts

Affiliations

Building disease-specific drug-protein connectivity maps from molecular interaction networks and PubMed abstracts

Jiao Li et al. PLoS Comput Biol. 2009 Jul.

Abstract

The recently proposed concept of molecular connectivity maps enables researchers to integrate experimental measurements of genes, proteins, metabolites, and drug compounds under similar biological conditions. The study of these maps provides opportunities for future toxicogenomics and drug discovery applications. We developed a computational framework to build disease-specific drug-protein connectivity maps. We integrated gene/protein and drug connectivity information based on protein interaction networks and literature mining, without requiring gene expression profile information derived from drug perturbation experiments on disease samples. We described the development and application of this computational framework using Alzheimer's Disease (AD) as a primary example in three steps. First, molecular interaction networks were incorporated to reduce bias and improve relevance of AD seed proteins. Second, PubMed abstracts were used to retrieve enriched drug terms that are indirectly associated with AD through molecular mechanistic studies. Third and lastly, a comprehensive AD connectivity map was created by relating enriched drugs and related proteins in literature. We showed that this molecular connectivity map development approach outperformed both curated drug target databases and conventional information retrieval systems. Our initial explorations of the AD connectivity map yielded a new hypothesis that diltiazem and quinidine may be investigated as candidate drugs for AD treatment. Molecular connectivity maps derived computationally can help study molecular signature differences between different classes of drugs in specific disease contexts. To achieve overall good data coverage and quality, a series of statistical methods have been developed to overcome high levels of data noise in biological networks and literature mining results. Further development of computational molecular connectivity maps to cover major disease areas will likely set up a new model for drug development, in which therapeutic/toxicological profiles of candidate drugs can be checked computationally before costly clinical trials begin.

PubMed Disclaimer

Conflict of interest statement

Jake Chen discloses that he is also the founder of MedeoLinx, LLC, an Indianapolis startup biotech company to provide novel drug discovery products and services based on translational systems biology.

Figures

Figure 1
Figure 1. A conceptual paradigm for the development of disease-specific molecular connectivity maps.
In this paradigm, molecular interaction data and PubMed abstracts are the primary data sources. Network mining is used to generate disease-related proteins from molecular interactions. Text mining is used to extract disease-related drug terms from PubMed abstracts and to further build drug-protein connectivity map in the disease context.
Figure 2
Figure 2. A computational framework for developing molecular connectivity maps in any given disease context.
The framework consists of three components: network construction, text retrieval and information extraction, and molecular connectivity mapping. The network construction component takes the inputs of disease-specific seed proteins and outputs a disease-related protein interaction network with a ranked list of disease-related proteins. The text retrieval and information extraction component takes synonym-expanded disease-related proteins and outputs a list of drug terms enriched in the retrieved collection of PubMed abstracts. The molecular connectivity mapping component takes two inputs—disease-related proteins from constructed protein interaction network in the first component, and enriched drug terms in the second component—and outputs a drug-protein connectivity map, in which further knowledge filters and clustering analysis can be applied.
Figure 3
Figure 3. The effect of different disease-related protein seeding situation on the specificity and sensitivity of AD drug identification.
In the text retrieval and information extraction component, the AD-related drugs are identified from the retrieved PubMed abstracts relevant to a list of AD proteins. We have an initial set of 49 AD seed proteins. To evaluate the effect of different seeding situations on AD drug identification, we sub-sampled the initial AD seed set into 8 data sets of varying sizes i.e., S5, S10, S15, S20, S25, S30, S35, S40 (the number indicating size) and also generated a random seed set with 50 proteins.. Given different seed sets, Panel (A) shows the specificity performances of AD-related drug identification at top N drugs determined by FDR (false discovery rate), and Panel (B) shows the sensitivity performances.
Figure 4
Figure 4. Specificity and sensitivity tradeoffs for AD-related drug identification.
The ROC (receiver operating characteristic) curve shows the sensitivity vs. false positive rate (1-specificity) for AD-related drug identification, when FDR (false discovery rate) varies at different threshold levels. Evaluation results are built by querying against PubMed abstracts and Enrez gene function description in search of evidence that may contain any of the drug terms and the term “Alzheimer's Disease” with all their term variants. The sensitivity and specificity are defined in Methods section.
Figure 5
Figure 5. Performance assessment of comparable systems on the task of identifying AD-related drugs.
Two curated data sources (DrugBank and CTD) and two computational methods (Chi2 and BITOLA) were selected to compare against the performance of our approach on AD drug identifications. DrugBank and CTD manually curated database content about disease-modifying gene/proteins and drugs. Chi2 is a baseline system using commonly Chi-square statistical method to identify significant co-occurring drug-disease relationships cited in PubMed abstracts. BITOLA (Biomedical Discovery Support System) is a computational system based on natural language processing that can extract drug-protein relation in a disease context. The histogram shows sensitivity, specificity, PPV (positive predictive value), F-score, and ACC (accuracy) of each group. These performance measurements are defined in the Methods section.
Figure 6
Figure 6. An AD connectivity map linking AD-related proteins to significant drugs.
After ranking proteins involved in the AD related protein interaction network and selecting enriched drugs in AD network related corpus, 66 AD highly-relevant proteins and 166 significant AD candidate drugs are identified to construct an AD connectivity map. Hierarchical clustering of drugs and proteins are performed before results are shown as the final heatmap format, in which the x-dimension represents drugs and the y-dimension represents proteins. The color intensity for each cell is drawn in proportion to the connectivity score as shown in the heatmap legenda. Panels (A) and (B) show zoomed-in views of boxed regions A and B on the original map. Panel (C) shows the chemical structures of three drugs (Diazepam, Clonazepam, and Flunitrazepam) from a cluster of drugs found in Panel (B), with their common structure (Benzodiazepine) shown in a box. CID refers to entity identifier in PubChem (http://pubchem.ncbi.nlm.nih.gov/).

References

    1. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313:1929–1935. - PubMed
    1. Kitano H. Systems biology: a brief overview. Science. 2002;295:1662–1664. - PubMed
    1. Beyer A, Bandyopadhyay S, Ideker T. Integrating physical and genetic maps: from genomes to interaction networks. Nat Rev Genet. 2007;8:699–710. - PMC - PubMed
    1. Wilson M, DeRisi J, Kristensen HH, Imboden P, Rane S, et al. Exploring drug-induced alterations in gene expression in Mycobacterium tuberculosis by microarray hybridization. Proc Natl Acad Sci U S A. 1999;96:12833–12838. - PMC - PubMed
    1. Lamb J. The Connectivity Map: a new tool for biomedical research. Nat Rev Cancer. 2007;7:54–60. - PubMed

Publication types