Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 15;34(18):3151-3159.
doi: 10.1093/bioinformatics/bty325.

EMUDRA: Ensemble of Multiple Drug Repositioning Approaches to improve prediction accuracy

Affiliations

EMUDRA: Ensemble of Multiple Drug Repositioning Approaches to improve prediction accuracy

Xianxiao Zhou et al. Bioinformatics. .

Abstract

Motivation: Availability of large-scale genomic, epigenetic and proteomic data in complex diseases makes it possible to objectively and comprehensively identify the therapeutic targets that can lead to new therapies. The Connectivity Map has been widely used to explore novel indications of existing drugs. However, the prediction accuracy of the existing methods, such as Kolmogorov-Smirnov statistic remains low. Here we present a novel high-performance drug repositioning approach that improves over the state-of-the-art methods.

Results: We first designed an expression weighted cosine (EWCos) method to minimize the influence of the uninformative expression changes and then developed an ensemble approach termed ensemble of multiple drug repositioning approaches (EMUDRA) to integrate EWCos and three existing state-of-the-art methods. EMUDRA significantly outperformed individual drug repositioning methods when applied to simulated and independent evaluation datasets. We predicted using EMUDRA and experimentally validated an antibiotic rifabutin as an inhibitor of cell growth in triple negative breast cancer. EMUDRA can identify drugs that more effectively target disease gene signatures and will thus be a useful tool for identifying novel therapies for complex diseases and predicting new indications for existing drugs.

Availability and implementation: The EMUDRA R package is available at doi: 10.7303/syn11510888.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Workflows of EWCos (A) and EMUDRA (B). (A) To adjust the lowly expressed genes, a logistic function was used to weight drug-induced expression changes. First, weight matrices were calculated for the parameters in the function. Next, for each instance, drug-induced signatures identified from replicates were used to optimize the parameters. Finally, weighted fold changes were used to calculate EWCos scores for a given query signature. (B) Matching scores from EWCos, Cosine, XCor and XSpe were normalized and combined to obtain an ensemble score to rank order drugs. GO enrichment analysis was performed on the signature gene sets reversed by the top drugs
Fig. 2.
Fig. 2.
Evaluation of EWCos, EMUDRA and the existing methods based on simulation studies. (A) For each instance, a drug-induced gene signature was identified based on treatment and the corresponding controls, which was used to query the CMap data by each method. Instances treated with the same drug of a query signature were considered as positive cases and other instances were used as negative. Performance was evaluated by ROC curves and pAUC at false positive rate 0.01. (B) Performance for simulated data with random noise from a uniform distribution
Fig. 3.
Fig. 3.
Performance of EMUDRA, EWCos and the existing drug repositioning approaches based on positive controls determined by ATC Codes and the LINCS Dataset. (A) ROC curves and pAUC for the prediction of the 1864 drug pairs sharing at least one ATC codes. These drug pairs were taken as positive cases and the rest drug pairs were set as negative cases. ROC curves and pAUC were generated with FPR <0.01. (B) Performance for predicting the drug pairs sharing at least two ATC codes. (C) Performance for predicting positive control drugs from the LINCS data. 24 cell line specific drug signatures identified from the LINCS data were then used to query the instances in CMap using nineapproaches. The instances in CMap with the same drug and cell line as those in a given LINCS signature were set as positive cases while other instances were taken as negative cases for prediction
Fig. 4.
Fig. 4.
Performance comparison of all possible combinations of the non-ensemble methods. (A) AUCs of the 255 possible combinations of the 8 non-ensemble methods based on the simulation data with noise. The numbers in the legend are number of methods assembled. (B)–(D) The ensemble rate of individual methods in the simulation, ATC and LINCS datasets. All 247 ensemble and 8 non-ensemble methods were rank ordered by AUC
Fig. 5.
Fig. 5.
Rifabutin dose-dependently inhibits growth of TNBC cells in 3D culture. (A) MDA-MB-231 cells were grown in 3D MatrigelTM and treated every 24–48 h with DMSO or 1, 4.8 or 25 μM rifabutin. Representative fields (5×) shown. (B). At least four fields each from at least three independent experiments were used for statistical analysis. Error bars represent standard error. Statistical significance of the difference in proportion of a field containing cells (field cellularity) between DMSO- and rifabutin-treated cells was tested using a one-tailed student’s t-test. (C). Viability of rifabutin and taxol-treated MDA-MB-231 cells grown in 3D Matrigel and treated every 48 h with media containing 0.4% DMSO, rifabutin or taxol. Luminescence was assayed using CellTiter-Glo 3D (Promega). Error bars represent standard error of the mean (SEM) from three independent experiments. One-sided student’s t-test comparing treatment to DMSO: *P < 0.05; **P < 0.005; ***P < 0.0005

Similar articles

Cited by

References

    1. Ashburner M. et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet., 25, 25–29. - PMC - PubMed
    1. Barrett T. et al. (2013) NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res., 41, D991–D995. - PMC - PubMed
    1. Benjamini Y., Hochberg Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B Met., 57, 289–300.
    1. Benjamini Y., Yekutieli D. (2001) The control of the false discovery rate in multiple testing under dependency. Ann. Stat., 29, 1165–1188.
    1. Cheng J. et al. (2014) Systematic evaluation of connectivity map for disease indications. Genome Med., 6, 540.. - PMC - PubMed

Publication types