Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 26;2(5):pgad147.
doi: 10.1093/pnasnexus/pgad147. eCollection 2023 May.

Evolution-strengthened knowledge graph enables predicting the targetability and druggability of genes

Affiliations

Evolution-strengthened knowledge graph enables predicting the targetability and druggability of genes

Yuan Quan et al. PNAS Nexus. .

Abstract

Identifying promising targets is a critical step in modern drug discovery, with causative genes of diseases that are an important source of successful targets. Previous studies have found that the pathogeneses of various diseases are closely related to the evolutionary events of organisms. Accordingly, evolutionary knowledge can facilitate the prediction of causative genes and further accelerate target identification. With the development of modern biotechnology, massive biomedical data have been accumulated, and knowledge graphs (KGs) have emerged as a powerful approach for integrating and utilizing vast amounts of data. In this study, we constructed an evolution-strengthened knowledge graph (ESKG) and validated applications of ESKG in the identification of causative genes. More importantly, we developed an ESKG-based machine learning model named GraphEvo, which can effectively predict the targetability and the druggability of genes. We further investigated the explainability of the ESKG in druggability prediction by dissecting the evolutionary hallmarks of successful targets. Our study highlights the importance of evolutionary knowledge in biomedical research and demonstrates the potential power of ESKG in promising target identification. The data set of ESKG and the code of GraphEvo can be downloaded from https://github.com/Zhankun-Xiong/GraphEvo.

Keywords: druggability; evolution; knowledge graph; prediction model construction; targetability.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Construction and validation of the ESKG. A) The types of entities and relations contained in the ESKG. The ESKG not only contains various types of common biological data (such as genes, diseases, biological processes, and drugs) but also integrates the evolutionary data (Ohnologs and evolutionary stages) of genes. B–E) TransE-learned embedding visualization of the entities and relations in the ESKG based on the t-SNE algorithm. Embeddings of the same type of entities or relations in the ESKG have relatively good colocalization in a 2D space, which validates the effectiveness of the ESKG. F) Performance comparison of the ESKG and initial KG in the prediction of causative genes for complex diseases. The results demonstrate that compared with the DRKG-derived initial KG, the ESKG showed superior power in the task of causative gene prediction for 17 of 19 kinds of diseases.
Fig. 2.
Fig. 2.
Construction of the gene targetability and druggability prediction model (GraphEvo). In this study, we used ESKG-derived embeddings (learned from TransE) as input features of genes and adopted the ensemble learning algorithm boosting to develop the targetability prediction model. In the modeling process, we took the target–disease pairs that were marketed by the FDA before the year 2000 as positive samples and randomly generated a considerable number of gene–disease pairs without clinical trial records as negative samples. For a candidate highly druggable target, we concatenated the ESKG-derived features and TDG-derived features as the final features to predict the potential druggability of drug targets, and the druggability (number of approved drugs) of the target was used as the label of the training sample. Then, we utilized the machine learning model of support vector regression and decision tree regression to construct the identification model for highly druggable targets. To obtain robust prediction results, the final predicted scores are the averages of these two types of regression methods.
Fig. 3.
Fig. 3.
Evolutionary hallmarks of highly druggable targets. A, B) Distribution of the number of approved drugs corresponding to successful drug targets. Based on King et al.'s (52) and Quan et al.'s (3) data, we obtained 1,536 approved target–disease pairs, covering 468 successful targets. A total of 114 of 468 (24.36%) targets had at least 10 approved drugs, and these were defined as highly druggable targets in this study. C) Comparison of the ratios of Ohnolog genes between highly druggable targets and nonhighly druggable targets. The Ohnolog ratio of highly druggable targets was significantly higher than that of nonhighly druggable targets (P = 7.40 × 10−8, χ2 test). D) Comparison of the ratios of evolutionary stages between highly druggable targets and other protein-coding genes. The majority of highly druggable targets (49.50%) originated from the Eumetazoa stage (P = 1.86 × 10−7, hypergeometric distribution test). E) The druggability of different disease category–associated drug targets. The results show that more than 50% of psychiatry and psychology category–associated targets are highly druggable targets.

References

    1. Emmerich C-H, et al. . 2021. Improving target assessment in biomedical research: the GOT-IT recommendations. Nat Rev Drug Discov. 20:64–81. - PMC - PubMed
    1. Gates A-J, Gysi D-M, Kellis M, Barabási A-L. 2021. A wealth of discovery built on the human genome project—by the numbers. Nature 590:212–215. - PubMed
    1. Quan Y, et al. . 2019. Systems chemical genetics-based drug discovery: prioritizing agents targeting multiple/reliable disease-associated genes as drug candidates. Front Genet. 10:474. - PMC - PubMed
    1. Dahlin J-L, Inglese J, Walters MA. 2015. Mitigating risk in academic preclinical drug discovery. Nat Rev Drug Discov. 14:279–294. - PMC - PubMed
    1. Benton M-L, et al. . 2021. The influence of evolutionary history on human health and disease. Nat Rev Genet. 22:269–283. - PMC - PubMed