Informational rescaling of PCA maps with application to genetic distance
- PMID: 39802212
- PMCID: PMC11719279
- DOI: 10.1016/j.csbj.2024.11.042
Informational rescaling of PCA maps with application to genetic distance
Abstract
Principal Component Analysis (PCA) is a powerful multivariate tool allowing the projection of data in low-dimensional representations. Nevertheless, datapoint distances on these low-dimensional projections are challenging to interpret. Here, we propose a computationally simple heuristic to transform a map based on standard PCA (when the variables are asymptotically Gaussian) into an entropy-based map where distances are based on mutual information (MI). Moreover, we show that in certain instances our proposed scaled PCA can improve cluster identification. Rescaling principal component-based distances using MI results in a representation of relative statistical associations when, as in genetics, it is applied on bit measurements between individuals' genomic mutual information. This entropy-rescaled PCA, while preserving order relationships (along a dimension), quantifies relative distances into information units, such as "bits". We illustrate the effect of this rescaling using genomics data derived from world populations and describe how the interpretation of results is impacted.
Keywords: Entropy; Genetic distance; Genetic maps; Information theory; Mutual information.
© 2024 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
Conflict of interest statement
The authors declare no conflict of interest.
Figures
References
-
- Taleb N.N. Statistical consequences of fat tails: real world preasymptotics, epistemology, and applications. 2022. arXiv:2001.10488https://arxiv.org/abs/2001.10488 Available from:
-
- Soyer E., Hogarth R.M. The illusion of predictability: how regression statistics mislead experts. Int J Forecast. 2012;28(3):695–711. doi: 10.1016/j.ijforecast.2012.02.002. https://www.sciencedirect.com/science/article/pii/S0169207012000258 Available from: - DOI
-
- Taleb N. Random House Publishing Group; 2008. Fooled by randomness: the hidden role of chance in life and in the markets, incerto.
-
- Goldstein D, Taleb N. We don't quite know what we are talking about when we talk about volatility, vol. 33 (03 2007).
-
- Goldstein D., Taleb N. Tandon School of Engineering, New York University; 2020. Common misapplications and misinterpretations of correlation in social science. preprint.
LinkOut - more resources
Full Text Sources