Statistical challenges of high-dimensional data
- PMID: 19805443
- PMCID: PMC2865881
- DOI: 10.1098/rsta.2009.0159
Statistical challenges of high-dimensional data
Abstract
Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying low-dimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue.
References
-
- Beal M. J., Ghahramani Z. 2006. Variational Bayesian learning of directed graphical models with hidden variables. Bayesian Stat. 1, 793–822. (10.1214/06-BA126) - DOI
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources