Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015;16 Suppl 13(Suppl 13):S13.
doi: 10.1186/1471-2105-16-S13-S13. Epub 2015 Sep 25.

A Bayesian approach for inducing sparsity in generalized linear models with multi-category response

A Bayesian approach for inducing sparsity in generalized linear models with multi-category response

Behrouz Madahian et al. BMC Bioinformatics. 2015.

Abstract

Background: The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to induce sparsity in a Bayesian Generalized Linear Model (GLM) setting. The approach was evaluated using a publicly available microarray dataset containing 99 samples corresponding to four different prostate cancer subtypes.

Results: A hierarchical Sparse Bayesian GLM using GDP prior (SBGG) was developed to take into account the progressive nature of the response variable. We obtained an average overall classification accuracy between 82.5% and 94%, which was higher than Support Vector Machine, Random Forest or a Sparse Bayesian GLM using double exponential priors. Additionally, SBGG outperforms the other 3 methods in correctly identifying pre-metastatic stages of cancer progression, which can prove extremely valuable for therapeutic and diagnostic purposes. Importantly, using Geneset Cohesion Analysis Tool, we found that the top 100 genes produced by SBGG had an average functional cohesion p-value of 2.0E-4 compared to 0.007 to 0.131 produced by the other methods.

Conclusions: Using GDP in a Bayesian GLM model applied to cancer progression data results in better subclass prediction. In particular, the method identifies pre-metastatic stages of prostate cancer with substantially better accuracy and produces more functionally relevant gene sets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart of Gibbs sampling procedure for SBGG. Here j = 1, 2,..., p and r = 1, 2,..., n and s = 2, 3, .. , k where n is the number of samples, p is the number of covariates in the model, and k is the number of categories of response variable.
Figure 2
Figure 2
Posterior mean of θs associated with gene 1 to gene 398. The x-axis represents the list of 398 differentially expressed genes obtained after Benjamini and Hochberg FDR correction of the results of single gene analysis using classical multi-category logistic regression. The y-axis represents the posterior mean of θ associated with each gene. While some signals are reduced toward zero, other signals stand out which turn out to be biologically more relevant to prostate cancer progression subtypes.
Figure 3
Figure 3
Accuracy plot of four models using different number of genes for classification of prostate cancer subtypes. The accuracy values are the average classification accuracy across 50 runs and the vertical lines show their associated standard deviations.

References

    1. Bae K, Mallick BK. Gene selection using a two-Level hierarchical Bayesian model. Bioinformatics. 2004;20(18):3423–3430. doi: 10.1093/bioinformatics/bth419. - DOI - PubMed
    1. Devore J, Peck R. Statistics: The Exploration and Analysis of Data. Duxbury, Pacific Grove CA; 1997.
    1. Thomas JG, Olson JM, Tapscott SJ, Zhao L. An efficient and robust statistical modelling approach to discover differentially expressed genes using genomic expression profiles. Genome Res. 2001;11(7):1227–1236. doi: 10.1101/gr.165101. - DOI - PMC - PubMed
    1. Pan W. A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics. 1996;18(4):546–554. - PubMed
    1. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97(457):77–87. doi: 10.1198/016214502753479248. - DOI

Publication types