. 2024 Dec 23;25(1):1236.

doi: 10.1186/s12864-024-11026-2.

Evaluation of Bayesian Linear Regression derived gene set test methods

Zhonghao Bai¹, Tahereh Gholipourshahraki², Merina Shrestha², Astrid Hjelholt^{3

4

5}, Sile Hu⁶, Mads Kjolby^{3

4

5}, Palle Duun Rohde⁷, Peter Sørensen⁸

Affiliations

¹ Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark. zhonghao.bai@qgg.au.dk.
² Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.
³ Department of Biomedicine, Aarhus University, Aarhus, Denmark.
⁴ Department of Clinical Pharmacology, Aarhus University Hospital, Aarhus, Denmark.
⁵ Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark.
⁶ Human Genetics Centre of Excellence, Novo Nordisk Research Centre Oxford, Oxford, UK.
⁷ Genomic Medicine, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark.
⁸ Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark. pso@qgg.au.dk.

PMID: 39716056
PMCID: PMC11667926
DOI: 10.1186/s12864-024-11026-2

Evaluation of Bayesian Linear Regression derived gene set test methods

Zhonghao Bai et al. BMC Genomics. 2024.

. 2024 Dec 23;25(1):1236.

doi: 10.1186/s12864-024-11026-2.

Authors

Zhonghao Bai¹, Tahereh Gholipourshahraki², Merina Shrestha², Astrid Hjelholt^{3

4

5}, Sile Hu⁶, Mads Kjolby^{3

4

5}, Palle Duun Rohde⁷, Peter Sørensen⁸

Affiliations

¹ Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark. zhonghao.bai@qgg.au.dk.
² Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.
³ Department of Biomedicine, Aarhus University, Aarhus, Denmark.
⁴ Department of Clinical Pharmacology, Aarhus University Hospital, Aarhus, Denmark.
⁵ Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark.
⁶ Human Genetics Centre of Excellence, Novo Nordisk Research Centre Oxford, Oxford, UK.
⁷ Genomic Medicine, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark.
⁸ Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark. pso@qgg.au.dk.

PMID: 39716056
PMCID: PMC11667926
DOI: 10.1186/s12864-024-11026-2

Abstract

Background: Gene set tests can pinpoint genes and biological pathways that exert small to moderate effects on complex diseases like Type 2 Diabetes (T2D). By aggregating genetic markers based on biological information, these tests can enhance the statistical power needed to detect genetic associations.

Results: Our goal was to develop a gene set test utilizing Bayesian Linear Regression (BLR) models, which account for both linkage disequilibrium (LD) and the complex genetic architectures intrinsic to diseases, thereby increasing the detection power of genetic associations. Through a series of simulation studies, we demonstrated how the efficacy of BLR derived gene set tests is influenced by several factors, including the proportion of causal markers, the size of gene sets, the percentage of genetic variance explained by the gene set, and the genetic architecture of the traits. By using KEGG pathways, eQTLs, and regulatory elements as different kinds of gene sets with T2D results, we also assessed the performance of gene set tests in explaining more about real phenotypes.

Conclusions: Comparing our method with other approaches, such as the gold standard MAGMA (Multi-marker Analysis of Genomic Annotation) approach, our BLR gene set test showed superior performance. Combining performance of our method in simulated and real phenotypes, this suggests that our BLR-based approach could more accurately identify genes and biological pathways underlying complex diseases.

Keywords: BLR; Complex disease; Gene set test; Type 2 diabetes.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Human studies in the UK Biobank project have received approval from the Ethics and Governance Framework (EGF), which ensures data and sample usage adheres to scientific and ethical standards. The consent to participation will apply throughout the lifetime of the UK Biobank, unless participants withdraw, and involves the collection and storage of biological samples (blood, saliva, urine) and electronic health records (GP, hospitals, dental, prescriptions). Individual data is anonymized, with each research project receiving its own anonymized dataset. The ethics committee waived the need for written informed consent. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Workflow of the marker set test project. (1) get GWAS summary statistics data from genotypes and phenotypes. (2) calculate gene-level statistics using different methods. (3) create gene sets based on biological gene sets. (4) design matrix to link genes to gene sets. (5) combine gene sets and adjusted marker effects for genes to do gene set test analysis using linear regression models

**Fig. 2**
F1 score averaged across scenarios in 4 methods for all configurations. y-axis represents simulated gene sets, which the first number represents the size of the gene set and the second number represents the number of causal genes in one gene set. x-axis represents the F1 score averaged across eight scenarios, and F1 score for each scenario is averaged across 10 replicates. Each dot represents one configuration of gene sets, and shapes of dots represent 4 methods that are compared in this figure, e.g. T_BayesC−d, T_BayesR−d, T_CT−z, T_SVD

**Fig. 3**
F1 score of gene sets averaged across scenarios for BayesC. y-axis represents simulated gene sets, which the first number represents the size of the gene set and the second number represents the number of causal genes in one gene set. x-axis represents the F1 score averaged across eight scenarios, and F1 score for each scenario is averaged across 10 replicates. Dots represent F1 score averaged across scenarios for T_BayesC−d, and F1 score for each scenario is averaged across 10 replicates

**Fig. 4**
Comparisons of F1 score of gene sets in BayesC model for heritabiliry, GA, $π$ in quantitative scenarios. y-axis represents simulated gene sets, which the first number represents the size of the gene set and the second number represents the number of causal genes in one gene set. x-axis represents the F1 score averaged across eight scenarios, and F1 score for each scenario is averaged across 10 replicates. Dots are colored the consistent colors to distinguish different properties of scenarios, and each dot represents F1 score of one gene set averaged across 10 replicates for T_BayesC−d

**Fig. 5**
Comparisons of F1 score of gene sets in BayesC model for heritabiliry, GA, $π$ , prevalence in binary scenarios. y-axis represents simulated gene sets, which the first number represents the size of the gene set and the second number represents the number of causal genes in one gene set. x-axis represents the F1 score averaged across eight scenarios, and F1 score for each scenario is averaged across 10 replicates. Dots are colored the consistent colors to distinguish different properties of scenarios, and each dot represents F1 score of one gene set averaged across 10 replicates for T_BayesC−d

**Fig. 6**
R² cumulative curve of gene sets for quantitative phenotypes in BayesC. 6 A. R² cumulative curve across gene sets in quantitative scenario 1 for quantitative phenotypes in BayesC. Each line represents one size of gene sets. 6B. R² cumulative curve of 200-gene sets averaged across scenarios for quantitative phenotypes in BayesC. y-axis represents accumulated R² for gene sets with 200 genes, and x-axis represents the number of R² for gene sets are accumulated (gene sets are sorted by R² from most to least for each scenario). Each line represents one scenario

See this image and copyright information in PMC

References

1. Visscher PM, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101(1):5–22. - PMC - PubMed
1. Reed J, Bain S, Kanamarlapudi V. A review of current trends with type 2 diabetes epidemiology, aetiology, pathogenesis, treatments and future perspectives. Diabetes Metab Syndr Obes. 2021;14:3567–602. - PMC - PubMed
1. Tinajero MG, Malik VS. An update on the epidemiology of type 2 diabetes: a global perspective. Endocrinol Metab Clin North Am. 2021;50(3):337–55. - PubMed
1. Rohde PD, et al. Covariance Association Test (CVAT) identifies genetic markers associated with schizophrenia in functionally associated biological processes. Genetics. 2016;203(4):1901–13. - PMC - PubMed
1. de Leeuw CA, et al. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol. 2015;11(4): e1004219. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluation of Bayesian Linear Regression derived gene set test methods

Affiliations

Evaluation of Bayesian Linear Regression derived gene set test methods

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials