Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 11;20(1):653.
doi: 10.1186/s12859-019-3297-0.

Time-resolved evaluation of compound repositioning predictions on a text-mined knowledge network

Affiliations

Time-resolved evaluation of compound repositioning predictions on a text-mined knowledge network

Michael Mayers et al. BMC Bioinformatics. .

Abstract

Background: Computational compound repositioning has the potential for identifying new uses for existing drugs, and new algorithms and data source aggregation strategies provide ever-improving results via in silico metrics. However, even with these advances, the number of compounds successfully repositioned via computational screening remains low. New strategies for algorithm evaluation that more accurately reflect the repositioning potential of a compound could provide a better target for future optimizations.

Results: Using a text-mined database, we applied a previously described network-based computational repositioning algorithm, yielding strong results via cross-validation, averaging 0.95 AUROC on test-set indications. However, to better approximate a real-world scenario, we built a time-resolved evaluation framework. At various time points, we built networks corresponding to prior knowledge for use as a training set, and then predicted on a test set comprised of indications that were subsequently described. This framework showed a marked reduction in performance, peaking in performance metrics with the 1985 network at an AUROC of .797. Examining performance reductions due to removal of specific types of relationships highlighted the importance of drug-drug and disease-disease similarity metrics. Using data from future timepoints, we demonstrate that further acquisition of these kinds of data may help improve computational results.

Conclusions: Evaluating a repositioning algorithm using indications unknown to input network better tunes its ability to find emerging drug indications, rather than finding those which have been randomly withheld. Focusing efforts on improving algorithmic performance in a time-resolved paradigm may further improve computational repositioning predictions.

Keywords: Compound repositioning; Drug central; Heterogeneous network; Machine learning; Semantic Medline database; Semantic network; Unified medical language system.

PubMed Disclaimer

Conflict of interest statement

The authors declare they have no competing interests.

Figures

Fig. 1
Fig. 1
The metagraph SemMedDB hetnet data model. This graph details the 6 node types and 30 edge types present in this network
Fig. 2
Fig. 2
5-fold cross validation results for SemMedDB network using DrugCentral gold standard. a) Receiver-Operator Characteristic curve displaying the mean result across 5-folds. Ten different seed values for randomly splitting indications in 5 are compared showing very little variation. b) Precision-Recall curve for the mean result across 5-folds, with ten different split seeds displayed. c) Histogram of log2 transformed rank of true positive disease for a given test-set positive drug, taken from a representative fold and seed of the cross-validation. If a drug treats multiple diseases, the ranks of all diseases treated in the test-set indications are shown. d) Histogram of log2 transformed rank of true positive drug for a given test-set disease, chosen from same fold and seed as C. If a disease is treated by multiple drugs in the test-set indications, all ranks are included. e) (left) Boxplot of 10 largest model coefficients in selected features across all folds and seeds. (right) Breakdown of metapath abbreviations. Node abbreviations appear in capital letters while edge abbreviations appear lower case
Fig. 3
Fig. 3
Time-resolved network build results. a) Number of nodes of a given type by network year. b) Average node degree for each node type across all network years
Fig. 4
Fig. 4
Machine learning results for the time-resolved networks. a) Performance metrics for the test-set (future) indications across the different network years. Only drugs approved after the year of the network are included in the test-set, while those approved prior are used for training. b) Box plots of the values of the model coefficients across all of the different network years. The top-10 coefficients with largest mean value across all models are shown. c) Probabilities of treatment of selected indications for each network model containing both the Drug and Disease concepts. Arrows indicate the year that the drug was first approved for any indication. Points left of the arrow on the graph, the indication was used as part of the validation set, and those to the right, the training set. d) AUROC and AUPRC data for indications based on their probabilities, split by the number of years between drug approval date and the year of the network. Values to the left of the Zero Point are indications approved before the network year thus part of the training-set, while those to the right are part of the test-set. Probabilities for all drug-disease pairs were standardized before combining across models. Points are given for each data point, while lines represent a 5-year rolling average of metrics
Fig. 5
Fig. 5
Analysis of edge type importance to the overall model. a) Edge dropout analysis showing the reduction in AUROC metric when the edges are dropped out at rates of 25, 50, 75, and 100%. Error bars indicate 95% confidence interval over 5 replicates with different seeds for dropout. The 9 edge types that had the greatest reduction from 0 to 100% dropout are displayed. b) Edge replacement analysis showing changes in AUROC when edges are replaced with those of the same type from another year’s network. The top 9 edges that showed greatest loss in performance in the dropout analysis between 0 and 100% dropout are displayed

References

    1. Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004;3:673–683. doi: 10.1038/nrd1468. - DOI - PubMed
    1. Li J, Zheng S, Chen B, Butte AJ, Swamidass SJ, Lu Z. A survey of current trends in computational drug repositioning. Brief Bioinform. 2016;17:2–12. doi: 10.1093/bib/bbv020. - DOI - PMC - PubMed
    1. Yella J, Yaddanapudi S, Wang Y, Jegga A, Yella JK, Yaddanapudi S, et al. Changing trends in computational drug repositioning. Pharmaceuticals. 2018;11:57. doi: 10.3390/ph11020057. - DOI - PMC - PubMed
    1. Sirota M, Dudley JT, Kim J, Chiang AP, Morgan AA, Sweet-Cordero A, et al. Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data. Sci Transl Med. 2011;3:96ra77. doi: 10.1126/scitranslmed.3001318. - DOI - PMC - PubMed
    1. Issa NT, Kruger J, Wathieu H, Raja R, Byers SW, Dakshanamurthy S. DrugGenEx-net: a novel computational platform for systems pharmacology and gene expression-based drug repurposing. BMC Bioinformatics. 2016;17:202. doi: 10.1186/s12859-016-1065-y. - DOI - PMC - PubMed

LinkOut - more resources