Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Jan 15;7(1):100556.
doi: 10.1016/j.xhgg.2025.100556. Epub 2025 Dec 9.

Leveraging large-scale biobanks for therapeutic target discovery

Affiliations

Leveraging large-scale biobanks for therapeutic target discovery

Brian R Ferolito et al. HGG Adv. .

Abstract

Large biobanks, including the Million Veteran Program (MVP), the UK Biobank, and FinnGen, provide genetic association results for more than 1 million individuals for hundreds of phenotypes. To select targets for pharmaceutical development, as well as to improve the understanding of existing targets, we harmonized these studies and performed two-sample Mendelian randomization (MR) on 2,003 phenotypes using genetic variants associated with gene expression (derived from GTEx and eQTLGen) and plasma protein levels (derived from ARIC, Fenland, and deCODE) as proxies of target modulation. We found 69,669 gene-trait pairs with evidence (p ≤ 1.6 × 10-9) for causal effects. From the selected gene-trait pairs, we observed 6,447 genes with strong causal evidence for at least one of 2,003 investigated traits. As expected, being identified as a gene-trait pair in our approach was significantly associated with higher odds of being an approved drug target and indication. We were able to rediscover 9% of approved drug targets in ChEMBL 34. Moreover, identified gene-traits were significantly associated with higher odds of being previously described as a gene-trait pair in OMIM, ClinVar, mouse knockout data, and rare variant burden studies. To enhance the translational potential of the resource, we developed a predictive ranking model trained using approved drug targets described in ChEMBL 34 as well as several different biological annotations. This model was able to accurately predict the odds of a particular significant MR result being developed into an approved drug and its clinical indication (precision-recall area under the receiver operating characteristic curve 0.79). We make our results publicly available in CIPHER.

Keywords: Mendelian randomization; genomics; target identification.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests A.S.B. reports grants outside of this work from AstraZeneca, Bayer, Biogen, BioMarin, and Sanofi. M.G. is a full-time employee of Regeneron Genetics Center; her main contributions occurred while she was an employee of Open Targets. J.P.C. is a full-time employee at Novartis Institutes for Biomedical Research; his main contributions to the project occurred while employed at the VA Boston Healthcare System. M.A.K. is a full-time employee at Variant Bio; his main contributions occurred while he was an employee of Open Targets.

Figures

Figure 1
Figure 1
A flowchart demonstrating the main findings from the pipeline and the resulting counts of both rediscovery and repurposing opportunities
Figure 2
Figure 2
Plots demonstrating the overlap ands odds ratios of orthogonal sources (A) Upset plot representing the different intersection numbers of gene-traits among the used sources of orthogonal biological information. (B) Forest plot of the association between the different biological features as predictors of being a significant MR gene-trait. Different distance metrics used for capturing a trait match were exact, same EFO term for both MVP and biological database; distant, closest 3% terms in the ranked list between MVP and biological database using a semantic distance metric; parent, same parent term in both MVP and biological database. Error bars represent 95% confidence intervals.
Figure 3
Figure 3
Results from genetic rediscoveries of approved drugs (A) Stacked bar plot of the distribution of parent terms for all currently approved drugs indications (left bar), and distribution of parent terms among gene-trait pairs considered rediscoveries (right bar). (B) Forest plot representing the over- or underrepresentation of specific parent terms among rediscoveries. The error bars represent 95% confidence intervals. (C) Mean number of available genetic phenotypes per parent term indication. (D) Mean number of cases used to calculate the association between genetic variants and the outcome per parent term.
Figure 4
Figure 4
Circle plot representing the flow of disease categories for approved drug indications to the disease categories of potential repurposing opportunities based upon our findings
Figure 5
Figure 5
Precision-recall estimate of our classifier

References

    1. Gaziano J.M., Concato J., Brophy M., Fiore L., Pyarajan S., Breeling J., Whitbourne S., Deen J., Shannon C., Humphries D., et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 2016;70:214–223. doi: 10.1016/j.jclinepi.2015.09.016. - DOI - PubMed
    1. Sudlow C., Gallacher J., Allen N., Beral V., Burton P., Danesh J., Downey P., Elliott P., Green J., Landray M., et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age. PLoS Med. 2015;12 doi: 10.1371/journal.pmed.1001779. - DOI - PMC - PubMed
    1. Pan-UKB team. 2020. https://pan.ukbb.broadinstitute.org
    1. Kurki M.I., Karjalainen J., Palta P., Sipilä T.P., Kristiansson K., Donner K.M., Reeve M.P., Laivuori H., Aavikko M., Kaunisto M.A., et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 2023;613:508–518. doi: 10.1038/s41586-022-05473-8. - DOI - PMC - PubMed
    1. Dhindsa R.S., Burren O.S., Sun B.B., Prins B.P., Matelska D., Wheeler E., Mitchell J., Oerton E., Hristova V.A., Smith K.R., et al. Rare variant associations with plasma protein levels in the UK Biobank. Nature. 2023;622:339–347. doi: 10.1038/s41586-023-06547-x. - DOI - PMC - PubMed