Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 16;23(1):232.
doi: 10.1186/s12859-022-04753-4.

Empowering the discovery of novel target-disease associations via machine learning approaches in the open targets platform

Affiliations

Empowering the discovery of novel target-disease associations via machine learning approaches in the open targets platform

Yingnan Han et al. BMC Bioinformatics. .

Abstract

Background: The Open Targets (OT) Platform integrates a wide range of data sources on target-disease associations to facilitate identification of potential therapeutic drug targets to treat human diseases. However, due to the complexity that targets are usually functionally pleiotropic and efficacious for multiple indications, challenges in identifying novel target to indication associations remain. Specifically, persistent need exists for new methods for integration of novel target-disease association evidence and biological knowledge bases via advanced computational methods. These offer promise for increasing power for identification of the most promising target-disease pairs for therapeutic development. Here we introduce a novel approach by integrating additional target-disease features with machine learning models to further uncover druggable disease to target indications.

Results: We derived novel target-disease associations as supplemental features to OT platform-based associations using three data sources: (1) target tissue specificity from GTEx expression profiles; (2) target semantic similarities based on gene ontology; and (3) functional interactions among targets by embedding them from protein-protein interaction (PPI) networks. Machine learning models were applied to evaluate feature importance and performance benchmarks for predicting targets with known drug indications. The evaluation results show the newly integrated features demonstrate higher importance than current features in OT. In addition, these also show superior performance over association benchmarks and may support discovery of novel therapeutic indications for highly pursued targets.

Conclusion: Our newly generated features can be used to represent additional underlying biological relatedness among targets and diseases to further empower improved performance for predicting novel indications for drug targets through advanced machine learning models. The proposed methodology enables a powerful new approach for systematic evaluation of drug targets with novel indications.

Keywords: Data Integration; Drug discovery; Drug repurposing; Feature engineering; Machine learning; Open targets; Target indication expansion; XGBoost.

PubMed Disclaimer

Conflict of interest statement

YH, KK, DKR, CZ, and ET are employees of Sanofi and may hold shares and/or stock options in the company.

Figures

Fig. 1
Fig. 1
Overview of Open Targets data and generation of newly computed features. Open Targets association evidence network edge weights are annotated for evidence from multiple sources (a). Novel target-disease association features generated from target-target similarity and target-disease matrices compared with factors used in calculation of a user-item matrix (b). Target-disease arrays are generated for each information source and association evidence for known drug status (c)
Fig. 2
Fig. 2
Workflow schematic for feature generation and therapeutic status prediction evaluation
Fig. 3
Fig. 3
Known drug prediction performance in Testing set by XGBoost, Random Forest and Logistic Regression. Precision-Recall curve (a), Receiver operating characteristic curve (b), F1 score (c), Sensitivity (d), Precision (e) and Specificity (f)
Fig. 4
Fig. 4
Newly computed features improve prediction accuracy. Prediction scores correspond to Testing set target-disease clinical trial stage (a). Feature importance scores indicate the feature types we generated strongly predict known drug therapeutic status (b). Target-disease arrays computed using target-target similarity reveal druggable target-disease pairs (c). Significant overlap between predicted indications and literature findings by text mining (d)

Similar articles

Cited by

References

    1. Koscielny G, An P, Carvalho-Silva D, Cham JA, Fumis L, Gasparyan R, et al. Open Targets: a platform for therapeutic target identification and validation. Nucleic Acids Res. 2017;45(D1):D985–D994. doi: 10.1093/nar/gkw1055. - DOI - PMC - PubMed
    1. Carvalho-Silva D, Pierleoni A, Pignatelli M, Ong C, Fumis L, Karamanis N, et al. Open Targets Platform: new developments and updates two years on. Nucleic Acids Res. 2019;47(D1):D1056–D1065. doi: 10.1093/nar/gky1133. - DOI - PMC - PubMed
    1. Ochoa D, Hercules A, Carmona M, Suveges D, Gonzalez-Uriarte A, Malangone C, et al. Open Targets Platform: supporting systematic drug-target identification and prioritisation. Nucleic Acids Res. 2021;49(D1):D1302–D1310. doi: 10.1093/nar/gkaa1027. - DOI - PMC - PubMed
    1. Freudenberg JM, Dunham I, Sanseau P, Rajpal DK. Uncovering new disease indications for G-protein coupled receptors and their endogenous ligands. BMC Bioinform. 2018;19(1):345. doi: 10.1186/s12859-018-2392-y. - DOI - PMC - PubMed
    1. Khaladkar M, Koscielny G, Hasan S, Agarwal P, Dunham I, Rajpal D, et al. Uncovering novel repositioning opportunities using the Open Targets platform. Drug Discov Today. 2017;22(12):1800–1807. doi: 10.1016/j.drudis.2017.09.007. - DOI - PubMed

LinkOut - more resources