. 2025 Jul 15;21(7):e1011776.

doi: 10.1371/journal.pgen.1011776. eCollection 2025 Jul.

Bayesian network imputation methods applied to multi-omics data identify putative causal relationships in a type 2 diabetes dataset containing incomplete data: An IMI DIRECT Study

Richard Howey^{1

2}, Jonathan Adam³, Jerzy Adamski^{4

5

6}, Natalie N Atabaki^{7

8

9}, Søren Brunak^{10

11}, Piotr Jaroslaw Chmura¹⁰, Federico De Masi¹², Emmanouil T Dermitzakis¹³, Juan J Fernandez-Tajes¹⁴, Ian M Forgie¹⁵, Paul W Franks^{9

16}, Giuseppe N Giordano⁹, Mark Haid¹⁷, Torben Hansen⁷, Tue H Hansen^{18

19}, Peter P Harms²⁰, Andrew T Hattersley^{21

22}, Mun-Gwan Hong²³, Ulrik Plesner Jacobsen¹⁰, Angus G Jones^{21

22}, Robert W Koivula⁸, Tarja Kokkola²⁴, Anubha Mahajan¹⁴, Andrea Mari²⁵, Mark I McCarthy¹⁴, Timothy J McDonald^{21

26}, Petra B Musholt²⁷, Imre Pavo²⁸, Ewan R Pearson¹⁵, Oluf Pedersen^{7

29}, Hartmut Ruetten³⁰, Femke Rutters³¹, Jochen M Schwenk²³, Sapna Sharma³, Leen M 't Hart^{31

32}, Henrik Vestergaard^{7

33}, Mark Walker³⁴; IMI DIRECT Consortium; Ana Viñuela^{35

36}, Heather J Cordell²

Affiliations

¹ Research Software Engineering, Newcastle University, Newcastle upon Tyne, United Kingdom.
² Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom.
³ Research Unit of Molecular Epidemiology, Institute of Epidemiology, German Research Center for Environmental Health, Helmholtz Zentrum München, München, Germany.
⁴ Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
⁵ Institute of Experimental Genetics, German Research Center for Environmental Health, Helmholtz Zentrum München, München, Germany.
⁶ Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia.
⁷ Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
⁸ Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom.
⁹ Department of Clinical Science, Genetic and Molecular Epidemiology, Lund University Diabetes Centre, Lund, Sweden.
¹⁰ Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
¹¹ Department of Public Health, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
¹² Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.
¹³ Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.
¹⁴ Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
¹⁵ Diabetes Endocrinology and Reproductive Biology, Ninewells Hospital and Medical School, University of Dundee, Dundee, United Kingdom.
¹⁶ Precision Healthcare University Research Institute, Queen Mary University of London, London, United Kingdom.
¹⁷ Metabolomics and Proteomics Core, German Research Center for Environmental Health, Helmholtz Zentrum München, München, Germany.
¹⁸ Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
¹⁹ Medical Department, Zealand University Hospital, Køge, Denmark.
²⁰ Department of General Practice Medicine, Amsterdam UMC, Amsterdam, The Netherlands.
²¹ Department of Clinical and Biomedical Sciences, University of Exeter College of Medicine & Health, Exeter, United Kingdom.
²² Macleod Diabetes and Endocrine Centre, Royal Devon University Healthcare NHS Foundation Trust, Exeter, United Kingdom.
²³ SciLifeLab, Department of Protein Science, KTH - Royal Institute of Technology, Stockholm, Sweden.
²⁴ Internal Medicine, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland.
²⁵ Institute of Neuroscience, National Research Council, Rome, Italy.
²⁶ Academic Department of Clinical Biochemistry, Royal Devon University Healthcare NHS Foundation Trust, Exeter, United Kingdom.
²⁷ Global Development, Sanofi-Aventis Deutschland GmbH, Frankfurt am Main, Germany.
²⁸ Eli Lilly Regional Operations GmbH, Wien, Austria.
²⁹ Center for Clinical Metabolic Research, Herlev and Gentofte University Hospital, Copenhagen, Denmark.
³⁰ Sanofi Partnering, Sanofi-Aventis Deutschland GmbH, Frankfurt am Main, Germany.
³¹ Department of Epidemiology and Data Science, Amsterdam UMC, Amsterdam, The Netherlands.
³² Department of Cell and Chemical Biology, Leiden UMC, Leiden, The Netherlands.
³³ Steno Diabetes Center Copenhagen, Copenhagen, Denmark.
³⁴ Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom.
³⁵ Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom.
³⁶ Population Health and Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, United Kingdom.

PMID: 40663565
PMCID: PMC12279144
DOI: 10.1371/journal.pgen.1011776

Bayesian network imputation methods applied to multi-omics data identify putative causal relationships in a type 2 diabetes dataset containing incomplete data: An IMI DIRECT Study

Richard Howey et al. PLoS Genet. 2025.

. 2025 Jul 15;21(7):e1011776.

doi: 10.1371/journal.pgen.1011776. eCollection 2025 Jul.

Authors

Affiliations

¹ Research Software Engineering, Newcastle University, Newcastle upon Tyne, United Kingdom.
² Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom.
³ Research Unit of Molecular Epidemiology, Institute of Epidemiology, German Research Center for Environmental Health, Helmholtz Zentrum München, München, Germany.
⁴ Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
⁵ Institute of Experimental Genetics, German Research Center for Environmental Health, Helmholtz Zentrum München, München, Germany.
⁶ Institute of Biochemistry, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia.
⁷ Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
⁸ Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom.
⁹ Department of Clinical Science, Genetic and Molecular Epidemiology, Lund University Diabetes Centre, Lund, Sweden.
¹⁰ Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
¹¹ Department of Public Health, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
¹² Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.
¹³ Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.
¹⁴ Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.
¹⁵ Diabetes Endocrinology and Reproductive Biology, Ninewells Hospital and Medical School, University of Dundee, Dundee, United Kingdom.
¹⁶ Precision Healthcare University Research Institute, Queen Mary University of London, London, United Kingdom.
¹⁷ Metabolomics and Proteomics Core, German Research Center for Environmental Health, Helmholtz Zentrum München, München, Germany.
¹⁸ Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
¹⁹ Medical Department, Zealand University Hospital, Køge, Denmark.
²⁰ Department of General Practice Medicine, Amsterdam UMC, Amsterdam, The Netherlands.
²¹ Department of Clinical and Biomedical Sciences, University of Exeter College of Medicine & Health, Exeter, United Kingdom.
²² Macleod Diabetes and Endocrine Centre, Royal Devon University Healthcare NHS Foundation Trust, Exeter, United Kingdom.
²³ SciLifeLab, Department of Protein Science, KTH - Royal Institute of Technology, Stockholm, Sweden.
²⁴ Internal Medicine, Institute of Clinical Medicine, University of Eastern Finland, Kuopio, Finland.
²⁵ Institute of Neuroscience, National Research Council, Rome, Italy.
²⁶ Academic Department of Clinical Biochemistry, Royal Devon University Healthcare NHS Foundation Trust, Exeter, United Kingdom.
²⁷ Global Development, Sanofi-Aventis Deutschland GmbH, Frankfurt am Main, Germany.
²⁸ Eli Lilly Regional Operations GmbH, Wien, Austria.
²⁹ Center for Clinical Metabolic Research, Herlev and Gentofte University Hospital, Copenhagen, Denmark.
³⁰ Sanofi Partnering, Sanofi-Aventis Deutschland GmbH, Frankfurt am Main, Germany.
³¹ Department of Epidemiology and Data Science, Amsterdam UMC, Amsterdam, The Netherlands.
³² Department of Cell and Chemical Biology, Leiden UMC, Leiden, The Netherlands.
³³ Steno Diabetes Center Copenhagen, Copenhagen, Denmark.
³⁴ Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom.
³⁵ Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom.
³⁶ Population Health and Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, United Kingdom.

PMID: 40663565
PMCID: PMC12279144
DOI: 10.1371/journal.pgen.1011776

Abstract

Here we report the results from exploratory analysis using a Bayesian network approach of data originally derived from a large North European study of type 2 diabetes (T2D) conducted by the IMI DIRECT consortium. 3029 individuals (795 with T2D and 2234 without) within 7 different study centres provided data comprising genotypes, proteins, metabolites, gene expression measurements and many different clinical variables. The main aim of the current study was to demonstrate the utility of our previously developed method to fit Bayesian networks by performing exploratory analysis of this dataset to identify possible causal relationships between these variables. The data was analysed using the BayesNetty software package, which can handle mixed discrete/continuous data with missing values. The original dataset consisted of over 16,000 variables, which were filtered down to 260 variables for analysis. Even with this reduction, no individual had complete data for all variables, making it impossible to analyse using standard Bayesian network methodology. However, using the recently proposed novel imputation method implemented in BayesNetty we computed a large average Bayesian network from which we could infer possible associations and causal relationships between variables of interest. Our results confirmed many previous findings in connection with T2D, including possible mediating proteins and genes, some of which have not been widely reported. We also confirmed potential causal relationships with liver fat that were identified in an earlier study that used the IMI DIRECT dataset but was limited to a smaller subset of individuals and variables (namely individuals with complete data at pre-defined variables of interest). In addition to providing valuable confirmation, our analyses thus demonstrate a proof-of-principle of the utility of the method implemented within BayesNetty. The full final average Bayesian network generated from our analysis is freely available and can be easily interrogated further to address specific focussed scientific questions of interest.

Copyright: © 2025 Howey et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: HJC and AV serve on the editorial board of PLOS Genetics. MMcC and AM are currently employees of Genentech and holders of Roche stock.

Figures

**Fig 1. Average BN constructed using imputed data of all variables with strength threshold 0.5.**
Edges are labelled with the probability that they exist (strength), and, in brackets, the probability that they exist in the shown direction, given that they exist (direction). The thickness of the edges is proportional to the edge strength. The nodes are coloured as follows: red are metabolites; blue (with gene name) are proteins; purple (with gene name) are gene expression measurements; amber are clinical variables; green (prefixed with AS) are allele scores.

**Fig 2. Markov Blanket of type 2 diabetes diagnosis taken from the average BN constructed using imputed data of all variables with strength threshold 0.5.**
Edges are labelled with the probability that they exist (strength), and, in brackets, the probability that they exist in the shown direction, given that they exist (direction). The thickness of the edges is proportional to the edge strength. The nodes are coloured as follows: red are metabolites; blue (with gene name) are proteins; purple (with gene name) are gene expression measurements; amber are clinical variables; green (prefixed with AS) are allele scores.

**Fig 3. Markov Blanket of BMI.**
All edges and nodes show a Markov Blanket of BMI taken from the average BN constructed using imputed data of all variables with strength threshold 0.5. Edges and nodes that are not faded show a Markov Blanket of BMI from the average BN with a strength threshold of 0.85 applied instead of 0.5. The thickness of the edges is proportional to the edge strength. Non-faded edges are highlighted in black and labelled in red with the probability that they exist (strength), and, in brackets, the probability that they exist in the shown direction, given that they exist (direction). Nodes are coloured as follows: red are metabolites; blue are proteins; purple are gene expression measurements; amber are clinical variables; green are allele scores.

**Fig 4. Sub-network taken from the average BN constructed using imputed data of all variables consisting of variables of interest with respect to T2D and BMI.**
Edges are labelled with the probability that they exist (strength), and, in brackets, the probability that they exist in the shown direction, given that they exist (direction). The thickness of the edges is proportional to the edge strength. The nodes are coloured as follows: amber are clinical variables and green are allele scores.

**Fig 5. Markov Blanket of centre.**
All edges and nodes show a Markov Blanket of centre taken from the average BN constructed using imputed data of all variables with strength threshold 0.5. Edges and nodes that are not faded show a Markov Blanket of centre from the average BN with a strength threshold of 0.9 applied instead of 0.5. The thickness of the edges is proportional to the edge strength. Non-faded edges are highlighted in black and labelled in red with the probability that they exist (strength), and, in brackets, the probability that they exist in the shown direction, given that they exist (direction); their connected nodes are also labelled and highlighted. Nodes are coloured as follows: red are metabolites; blue (with gene name) are proteins; purple (with gene name) are gene expression measurements; amber are clinical variables; green (prefixed with AS) are allele scores.

See this image and copyright information in PMC

References

1. Koivula RW, Heggie A, Barnett A, Cederberg H, Hansen TH, Koopman AD, et al. Discovery of biomarkers for glycaemic deterioration before and after the onset of type 2 diabetes: rationale and design of the epidemiological studies within the IMI DIRECT Consortium. Diabetologia. 2014;57(6):1132–42. doi: 10.1007/s00125-014-3216-x - DOI - PMC - PubMed
1. Koivula RW, Forgie IM, Kurbasic A, Viñuela A, Heggie A, Giordano GN, et al. Discovery of biomarkers for glycaemic deterioration before and after the onset of type 2 diabetes: descriptive characteristics of the epidemiological studies within the IMI DIRECT Consortium. Diabetologia. 2019;62(9):1601–15. doi: 10.1007/s00125-019-4906-1 - DOI - PMC - PubMed
1. Atabaki NN, Coral DE, Pomares-Millan H, Smith K, Behjat HH, Koivula RW, et al. A biological-systems-based analyses using proteomic and metabolic network inference reveals mechanistic insights into hepatic lipid accumulation: an IMI-DIRECT study. medRxiv. 2025:2025.06.02.25328773. doi: 10.1101/2025.06.02.25328773 - DOI - PMC - PubMed
1. Brown AA, Fernandez-Tajes JJ, Hong M-G, Brorsson CA, Koivula RW, Davtian D, et al. Genetic analysis of blood molecular phenotypes reveals common properties in the regulatory networks affecting complex traits. Nat Commun. 2023;14(1):5062. doi: 10.1038/s41467-023-40569-3 - DOI - PMC - PubMed
1. Howey R, Shin S-Y, Relton C, Davey Smith G, Cordell HJ. Bayesian network analysis incorporating genetic anchors complements conventional Mendelian randomization approaches for exploratory analysis of causal relationships in complex data. PLoS Genet. 2020;16(3):e1008198. doi: 10.1371/journal.pgen.1008198 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

WT_/Wellcome Trust/United Kingdom

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bayesian network imputation methods applied to multi-omics data identify putative causal relationships in a type 2 diabetes dataset containing incomplete data: An IMI DIRECT Study

Affiliations

Bayesian network imputation methods applied to multi-omics data identify putative causal relationships in a type 2 diabetes dataset containing incomplete data: An IMI DIRECT Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical