Bayesian Distance Weighted Discrimination
- PMID: 36465095
- PMCID: PMC9717576
- DOI: 10.1080/10618600.2022.2069778
Bayesian Distance Weighted Discrimination
Abstract
Distance weighted discrimination (DWD) is a linear discrimination method that is particularly well-suited for classification tasks with high-dimensional data. The DWD coefficients minimize an intuitive objective function, which can solved efficiently using state-of-the-art optimization techniques. However, DWD has not yet been cast into a model-based framework for statistical inference. In this article we show that DWD identifies the mode of a proper Bayesian posterior distribution, that results from a particular link function for the class probabilities and a shrinkage-inducing proper prior distribution on the coefficients. We describe a relatively efficient Markov chain Monte Carlo (MCMC) algorithm to simulate from the true posterior under this Bayesian framework. We show that the posterior is asymptotically normal and derive the mean and covariance matrix of its limiting distribution. Through several simulation studies and an application to breast cancer genomics we demonstrate how the Bayesian approach to DWD can be used to (1) compute well-calibrated posterior class probabilities, (2) assess uncertainty in the DWD coefficients and resulting sample scores, (3) improve power via semi-supervised analysis when not all class labels are available, and (4) automatically determine a penalty tuning parameter within the model-based framework. R code to perform Bayesian DWD is available at https://github.com/lockEF/BayesianDWD.
Keywords: Cancer genomics; distance weighted discrimination; high-dimensional data; probabilistic classification.
Figures
References
-
- Breiman L (2001), ‘Random forests’, Machine learning 45(1), 5–32.
-
- Carlin BP and Louis TA (2008), Bayesian methods for data analysis, CRC Press.
-
- Cortes C and Vapnik V (1995), ‘Support-vector networks’, Machine learning 20(3), 273–297.
-
- Gelman A, Jakulin A, Pittau MG and Su Y-S (2008), ‘A weakly informative default prior distribution for logistic and other regression models’, The annals of applied statistics 2(4), 1360–1383.
-
- Ghosal S (1997), A review of consistency and convergence of posterior distribution, in ‘Varanashi Symposium in Bayesian Inference, Banaras Hindu University’.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous