Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 23;25(1):1236.
doi: 10.1186/s12864-024-11026-2.

Evaluation of Bayesian Linear Regression derived gene set test methods

Affiliations

Evaluation of Bayesian Linear Regression derived gene set test methods

Zhonghao Bai et al. BMC Genomics. .

Abstract

Background: Gene set tests can pinpoint genes and biological pathways that exert small to moderate effects on complex diseases like Type 2 Diabetes (T2D). By aggregating genetic markers based on biological information, these tests can enhance the statistical power needed to detect genetic associations.

Results: Our goal was to develop a gene set test utilizing Bayesian Linear Regression (BLR) models, which account for both linkage disequilibrium (LD) and the complex genetic architectures intrinsic to diseases, thereby increasing the detection power of genetic associations. Through a series of simulation studies, we demonstrated how the efficacy of BLR derived gene set tests is influenced by several factors, including the proportion of causal markers, the size of gene sets, the percentage of genetic variance explained by the gene set, and the genetic architecture of the traits. By using KEGG pathways, eQTLs, and regulatory elements as different kinds of gene sets with T2D results, we also assessed the performance of gene set tests in explaining more about real phenotypes.

Conclusions: Comparing our method with other approaches, such as the gold standard MAGMA (Multi-marker Analysis of Genomic Annotation) approach, our BLR gene set test showed superior performance. Combining performance of our method in simulated and real phenotypes, this suggests that our BLR-based approach could more accurately identify genes and biological pathways underlying complex diseases.

Keywords: BLR; Complex disease; Gene set test; Type 2 diabetes.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Human studies in the UK Biobank project have received approval from the Ethics and Governance Framework (EGF), which ensures data and sample usage adheres to scientific and ethical standards. The consent to participation will apply throughout the lifetime of the UK Biobank, unless participants withdraw, and involves the collection and storage of biological samples (blood, saliva, urine) and electronic health records (GP, hospitals, dental, prescriptions). Individual data is anonymized, with each research project receiving its own anonymized dataset. The ethics committee waived the need for written informed consent. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Workflow of the marker set test project. (1) get GWAS summary statistics data from genotypes and phenotypes. (2) calculate gene-level statistics using different methods. (3) create gene sets based on biological gene sets. (4) design matrix to link genes to gene sets. (5) combine gene sets and adjusted marker effects for genes to do gene set test analysis using linear regression models
Fig. 2
Fig. 2
F1 score averaged across scenarios in 4 methods for all configurations. y-axis represents simulated gene sets, which the first number represents the size of the gene set and the second number represents the number of causal genes in one gene set. x-axis represents the F1 score averaged across eight scenarios, and F1 score for each scenario is averaged across 10 replicates. Each dot represents one configuration of gene sets, and shapes of dots represent 4 methods that are compared in this figure, e.g. TBayesC−d, TBayesR−d, TCT−z, TSVD
Fig. 3
Fig. 3
F1 score of gene sets averaged across scenarios for BayesC. y-axis represents simulated gene sets, which the first number represents the size of the gene set and the second number represents the number of causal genes in one gene set. x-axis represents the F1 score averaged across eight scenarios, and F1 score for each scenario is averaged across 10 replicates. Dots represent F1 score averaged across scenarios for TBayesC−d, and F1 score for each scenario is averaged across 10 replicates
Fig. 4
Fig. 4
Comparisons of F1 score of gene sets in BayesC model for heritabiliry, GA, πin quantitative scenarios. y-axis represents simulated gene sets, which the first number represents the size of the gene set and the second number represents the number of causal genes in one gene set. x-axis represents the F1 score averaged across eight scenarios, and F1 score for each scenario is averaged across 10 replicates. Dots are colored the consistent colors to distinguish different properties of scenarios, and each dot represents F1 score of one gene set averaged across 10 replicates for TBayesC−d
Fig. 5
Fig. 5
Comparisons of F1 score of gene sets in BayesC model for heritabiliry, GA, π, prevalence in binary scenarios. y-axis represents simulated gene sets, which the first number represents the size of the gene set and the second number represents the number of causal genes in one gene set. x-axis represents the F1 score averaged across eight scenarios, and F1 score for each scenario is averaged across 10 replicates. Dots are colored the consistent colors to distinguish different properties of scenarios, and each dot represents F1 score of one gene set averaged across 10 replicates for TBayesC−d
Fig. 6
Fig. 6
R2 cumulative curve of gene sets for quantitative phenotypes in BayesC. 6 A. R2 cumulative curve across gene sets in quantitative scenario 1 for quantitative phenotypes in BayesC. Each line represents one size of gene sets. 6B. R2 cumulative curve of 200-gene sets averaged across scenarios for quantitative phenotypes in BayesC. y-axis represents accumulated R2 for gene sets with 200 genes, and x-axis represents the number of R2 for gene sets are accumulated (gene sets are sorted by R2 from most to least for each scenario). Each line represents one scenario

Similar articles

References

    1. Visscher PM, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101(1):5–22. - PMC - PubMed
    1. Reed J, Bain S, Kanamarlapudi V. A review of current trends with type 2 diabetes epidemiology, aetiology, pathogenesis, treatments and future perspectives. Diabetes Metab Syndr Obes. 2021;14:3567–602. - PMC - PubMed
    1. Tinajero MG, Malik VS. An update on the epidemiology of type 2 diabetes: a global perspective. Endocrinol Metab Clin North Am. 2021;50(3):337–55. - PubMed
    1. Rohde PD, et al. Covariance Association Test (CVAT) identifies genetic markers associated with schizophrenia in functionally associated biological processes. Genetics. 2016;203(4):1901–13. - PMC - PubMed
    1. de Leeuw CA, et al. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol. 2015;11(4): e1004219. - PMC - PubMed