Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 3;8(7):e67672.
doi: 10.1371/journal.pone.0067672. Print 2013.

An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways

Affiliations

An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways

Bin Peng et al. PLoS One. .

Abstract

The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. An Example of KEGG Pathway.
Figure 2
Figure 2. Gene and Pathway selection results in Scenario 1.
The top figure corresponds to the posterior distribution of gene with effect size formula image, and second figure formula image. The two smaller figures on the bottom demonstrate the posterior pathway selection probabilities, with the left one corresponding to formula image, and right one formula image. The labeled red lines indicate causal genes or causal pathways (those containing causal genes). These distributions were obtained by averaging over the 100 simulated sets of data.
Figure 3
Figure 3. Gene and Pathway selection results in Scenario 2.
The top figure corresponds to the posterior probabilities of gene selection with effect size formula image, and second figure formula image. The two smaller figures on the bottom demonstrate the posterior probabilities of pathway selection, with the left one corresponds to formula image, and right one formula image. The red lines indicate causal genes or causal pathways (those containing causal genes). These distributions were obtained by averaging over the 100 simulated sets of data.
Figure 4
Figure 4. Posterior Gene Selection Probabilities when P = 2000.
The top figure shows the result for Scenario 3, and the bottom one Scenario 4.
Figure 5
Figure 5. Mean Square Error for Gene Selections.
Averaged over 100 simulated data in Scenario 1 for two set of gene effect sizes formula image. The top one is for formula image and bottom one formula image.
Figure 6
Figure 6. ROC Curves for iBVS and YS-BVS (Yang & Song's BVS).
Figure 7
Figure 7. Gene and Pathway Selection Results for Stroke Data.

Similar articles

Cited by

References

    1. Jain K (2009) Textbook of Personalized Medicine. Springer.
    1. Paik S, Shak S, Tang G, Kim C, Baker J, et al. (2004) A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. New England Journal of Medicine 351: 2817–2826. - PubMed
    1. Mallick B, Gold D, Baladandayuthapani V (2009) Bayesian Analysis of Gene Expression Data, volume 130. John Wiley & Sons Inc.
    1. Guan Y, Stephens M (2011) Bayesian variable selection regression for genome-wide association studies and other large-scale problems. The Annals of Applied Statistics 5: 1780–1815.
    1. Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. science 286: 531–537. - PubMed

Publication types