An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways
- PMID: 23844055
- PMCID: PMC3700986
- DOI: 10.1371/journal.pone.0067672
An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways
Abstract
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.
Conflict of interest statement
Figures


















Similar articles
-
Integrative Bayesian variable selection with gene-based informative priors for genome-wide association studies.BMC Genet. 2014 Dec 10;15:130. doi: 10.1186/s12863-014-0130-7. BMC Genet. 2014. PMID: 25491445 Free PMC article.
-
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67. BMC Bioinformatics. 2007. PMID: 17328811 Free PMC article.
-
Hierarchical Probabilistic Interaction Modeling for Multiple Gene Expression Replicates.IEEE/ACM Trans Comput Biol Bioinform. 2014 Mar-Apr;11(2):336-46. doi: 10.1109/TCBB.2014.2299804. IEEE/ACM Trans Comput Biol Bioinform. 2014. PMID: 26355781
-
Application of Bayesian genomic prediction methods to genome-wide association analyses.Genet Sel Evol. 2022 May 13;54(1):31. doi: 10.1186/s12711-022-00724-8. Genet Sel Evol. 2022. PMID: 35562659 Free PMC article. Review.
-
Computational approaches for translational clinical research in disease progression.J Investig Med. 2011 Aug;59(6):893-903. doi: 10.2310/JIM.0b013e318224d8cc. J Investig Med. 2011. PMID: 21712727 Free PMC article. Review.
Cited by
-
Bayesian Network Marker Selection via the Thresholded Graph Laplacian Gaussian Prior.Bayesian Anal. 2020 Mar;15(1):79-102. doi: 10.1214/18-ba1142. Epub 2019 Jan 5. Bayesian Anal. 2020. PMID: 32802246 Free PMC article.
-
Evaluating the Value of Defensins for Diagnosing Secondary Bacterial Infections in Influenza-Infected Patients.Front Microbiol. 2018 Nov 20;9:2762. doi: 10.3389/fmicb.2018.02762. eCollection 2018. Front Microbiol. 2018. PMID: 30524393 Free PMC article.
-
Biomarker panels in ischemic stroke.Stroke. 2015 Mar;46(3):915-20. doi: 10.1161/STROKEAHA.114.005604. Epub 2015 Feb 5. Stroke. 2015. PMID: 25657186 Free PMC article. No abstract available.
-
The spike-and-slab lasso Cox model for survival prediction and associated genes detection.Bioinformatics. 2017 Sep 15;33(18):2799-2807. doi: 10.1093/bioinformatics/btx300. Bioinformatics. 2017. PMID: 28472220 Free PMC article.
-
Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework.BMC Bioinformatics. 2014 Dec 5;15(1):390. doi: 10.1186/s12859-014-0390-2. BMC Bioinformatics. 2014. PMID: 25475756 Free PMC article.
References
-
- Jain K (2009) Textbook of Personalized Medicine. Springer.
-
- Paik S, Shak S, Tang G, Kim C, Baker J, et al. (2004) A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. New England Journal of Medicine 351: 2817–2826. - PubMed
-
- Mallick B, Gold D, Baladandayuthapani V (2009) Bayesian Analysis of Gene Expression Data, volume 130. John Wiley & Sons Inc.
-
- Guan Y, Stephens M (2011) Bayesian variable selection regression for genome-wide association studies and other large-scale problems. The Annals of Applied Statistics 5: 1780–1815.
-
- Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. science 286: 531–537. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources