Systems Chemical Genetics-Based Drug Discovery: Prioritizing Agents Targeting Multiple/Reliable Disease-Associated Genes as Drug Candidates

Yuan Quan¹, Zhi-Hui Luo², Qing-Yong Yang¹, Jiang Li¹, Qiang Zhu¹, Ye-Mao Liu¹, Bo-Min Lv¹, Ze-Jia Cui¹, Xuan Qin¹, Yan-Hua Xu³, Li-Da Zhu¹, Hong-Yu Zhang¹

Affiliations

¹ Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China.
² College of Life Sciences and Technology, Huazhong Agricultural University, Wuhan, China.
³ Sci-meds Biopharmaceutical Co., Ltd., Wuhan, China.

PMID: 31191604
PMCID: PMC6549477
DOI: 10.3389/fgene.2019.00474

Systems Chemical Genetics-Based Drug Discovery: Prioritizing Agents Targeting Multiple/Reliable Disease-Associated Genes as Drug Candidates

Yuan Quan et al. Front Genet. 2019.

. 2019 May 29:10:474.

doi: 10.3389/fgene.2019.00474. eCollection 2019.

Authors

Yuan Quan¹, Zhi-Hui Luo², Qing-Yong Yang¹, Jiang Li¹, Qiang Zhu¹, Ye-Mao Liu¹, Bo-Min Lv¹, Ze-Jia Cui¹, Xuan Qin¹, Yan-Hua Xu³, Li-Da Zhu¹, Hong-Yu Zhang¹

Affiliations

¹ Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China.
² College of Life Sciences and Technology, Huazhong Agricultural University, Wuhan, China.
³ Sci-meds Biopharmaceutical Co., Ltd., Wuhan, China.

PMID: 31191604
PMCID: PMC6549477
DOI: 10.3389/fgene.2019.00474

Abstract

Genetic disease genes are considered a promising source of drug targets. Most diseases are caused by more than one pathogenic factor; thus, it is reasonable to consider that chemical agents targeting multiple disease genes are more likely to have desired activities. This is supported by a comprehensive analysis on the relationships between agent activity/druggability and target genetic characteristics. The therapeutic potential of agents increases steadily with increasing number of targeted disease genes, and can be further enhanced by strengthened genetic links between targets and diseases. By using the multi-label classification models for genetics-based drug activity prediction, we provide universal tools for prioritizing drug candidates. All of the documented data and the machine-learning prediction service are available at SCG-Drug (http://zhanglab.hzau.edu.cn/scgdrug).

Keywords: disease associated genes; drug discovery; drug targets; machine learning; systems chemical genetics.

PubMed Disclaimer

Figures

**Figure 1**
Pipeline for data processing. Disease-associated genes were derived from eight databases. Agent activities were obtained from TTD, DrugBank, and ClinicalTrials. The disease terms of genes and the indication annotations of agents were uniformed to UMLS concepts using MetaMap. Using the disease classes provided by pharmaprojects (Similarity threshold: 0.75), 703 types of diseases for 19,233 genes were identified, resulting in 914,190 gene-disease pairs. Through searching DGIdb, TTD, and DrugBank, 3,346 genes were targeted by 14,558 agents. 3,346 targets were associated with 703 diseases, resulting in 359,101 gene-disease pairs; 5,759 agents were indicated for treating 667 diseases, resulting in 74,902 agent-disease pairs.

**Figure 2**
Validation of gene-disease pairs, agent-disease pairs and agent-target pairs. **(A)** Correlations between disease similarity and disease gene set distance or drug set distance. The disease similarity was measured using the UMLS::similarity, and the disease gene set or drug set distance was characterized by Tanimoto coefficient. **(B)** Clinically active ratios of genetics-implicated agent activities. The red, brown, and green vertical dashed lines indicate the clinically active ratios derived from real agent- target pairs in TTD, DGIdb, and DrugBank, respectively. The curves show the clinically active ratio frequency distributions for 10,000 random permutations of agent-target pairs.

**Figure 3**
Dependence of agent activity/druggability on target quantity. Therapeutic potential of agents increases with increasing number of targeted disease genes.

**Figure 4**
Sequence similarity and GO distances of gene pairs targeted by the multi-target agents. **(A)** The sequences for target pairs hit by the agents are more similar than those randomly selected from the target set (P = 2.20 × 10–16, Wilcoxon rank-sum test). **(B)** The GO-based Czekanowski–Dice distances of the gene pairs targeted by the agents are evidently smaller than those of randomly selected target pairs (P = 2.20 × 10–16, Wilcoxon rank-sum test).

**Figure 5**
Relationships between druggability and target number of agents derived from cMap. **(A)** With the increasing number of targets, the agents cover more gene modules (ANOVA: P = 1.94 × 10⁻⁹). **(B)** With the increasing number of targets, the drug approval ratio increases slightly. **(C)** If only disease-associated genes are considered, the drug approval ratio rises evidently with the increase of targeted gene number.

**Figure 6**
Effects of top genes on the clinically active/approval ratio of agents. The top genes were derived from AlzGene, SZGene, PDGene, and MSGene. From DGIdb, TTD and DrugBank, we retrieved 3,692 agents targeting the genes contained in the four databases, of which 726 targeted at least one top gene. The results show that for the agents covering top genes, their genetics-implicated activities are more likely to be supported by clinical trials and to be clinically approved (P-values were calculated using the hypergeometric test).

**Figure 7**
Effects of disease-associated ohnolog genes on the clinically active/approval ratio of agents. A total of 7,294 ohnolog genes were obtained from Makino and Mclysaght's work31, in which 5,265 genes were disease-associated. Searching DGIdb, TTD and DrugBank revealed that 4,058 agents targeted 1,164 of the 5,265 ohnolog genes. The results show that for the agents covering disease-associated ohnolog genes, their genetics-derived activities are more likely to be supported by clinical evidence and be clinically approved (P-values were calculated using the hypergeometric test).

**Figure 8**
Clinically active ratio of genetics-implicated agent indications derived from different disease gene databases.

**Figure 9**
Dependence of agent activity/druggability on target quality. With the increase of druggability scores of target genes, the therapeutic potential of corresponding agents also increases.

**Figure 10**
Comparison of druggability scores for top genes derived from AlzGene, SzGene, PDGene, MSGene, and ordinary genes with the same pathogenic annotations. The top genes exhibit evidently higher scores than other genes (P = 2.51 × 10–52, Wilcoxon rank-sum test).

**Figure 11**
Efficiency of druggability scoring system in characterizing gene-disease links. Diseases exhibiting similar gene profiles, calculated by Spearman's rank correlation, display similar symptoms measured by UMLS::similarity. The number of disease pairs is shown in the box. The color exhibits enrichment of the number in each row, with red representing the strong enrichment and blue representing the weak.

**Figure 12**
Agent activity prediction with machine-learning models. **(A)** Workflow for the machine-learning model establishment. **(B)** Sketch view for the rationale of agent activity prediction. **(C)** The overall performance of the ensemble classifier.

**Figure 13**
Cytotoxicity of 14 predicted anti-leukemia agents. K562 cells were treated with **(A)** Amuvatinib, **(B)** Aspirin, **(C)** Brivanib, **(D)** Crenolanib, **(E)** Gossypol acetic acid, **(F)** Masitinib, **(G)** Motesanib, **(H)** Niraparib, **(I)** RGB-286638, **(J)** Saracatinib, **(K)** Tandutinib, **(L)** Trametinib, **(M)** Veliparib, **(N)** Vemurafenib. The results show that 10 agents (Amuvatinib, Brivanib, Crenolanib, Masitinib, Motesanib, Niraparib, Saracatinib, Tandutinib, Veliparib, Vemurafenib) (71.43%) can efficiently inhibit the growth of K562.

See this image and copyright information in PMC

References

1. Allen N. C., Bagade S., McQueen M. B., Ioannidis J. P., Kavvoura F. K., Khoury M. J., et al. . (2008). Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database. Nat. Genet. 40, 827–834. 10.1038/ng.171 - DOI - PubMed
1. Aronson A. R. (2001). Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc. AMIA Symp. 2001, 17–21. - PMC - PubMed
1. Becker K. G., Barnes K. C., Bright T. J., Wang S. A. (2004). The genetic association database. Nat. Genet. 36, 431–432. 10.1038/ng0504-431 - DOI - PubMed
1. Bertram L., McQueen M. B., Mullin K., Blacker D., Tanzi R. E. (2007). Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat. Genet. 39, 17–23. 10.1038/ng1934 - DOI - PubMed
1. Brinkman R. R., Dubé M. P., Rouleau G. A., Orr A. C., Samuels M. E. (2006). Human monogenic disorders-a source of novel drug targets. Nat. Rev. Genet. 7, 249–260. 10.1038/nrg1828 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Systems Chemical Genetics-Based Drug Discovery: Prioritizing Agents Targeting Multiple/Reliable Disease-Associated Genes as Drug Candidates

Affiliations

Systems Chemical Genetics-Based Drug Discovery: Prioritizing Agents Targeting Multiple/Reliable Disease-Associated Genes as Drug Candidates

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources