Compositional Data Analysis using Kernels in mass cytometry data
- PMID: 35224501
- PMCID: PMC8867823
- DOI: 10.1093/bioadv/vbac003
Compositional Data Analysis using Kernels in mass cytometry data
Abstract
Motivation: Cell-type abundance data arising from mass cytometry experiments are compositional in nature. Classical association tests do not apply to the compositional data due to their non-Euclidean nature. Existing methods for analysis of cell type abundance data suffer from several limitations for high-dimensional mass cytometry data, especially when the sample size is small.
Results: We proposed a new multivariate statistical learning methodology, Compositional Data Analysis using Kernels (CODAK), based on the kernel distance covariance (KDC) framework to test the association of the cell type compositions with important predictors (categorical or continuous) such as disease status. CODAK scales well for high-dimensional data and provides satisfactory performance for small sample sizes (n < 25). We conducted simulation studies to compare the performance of the method with existing methods of analyzing cell type abundance data from mass cytometry studies. The method is also applied to a high-dimensional dataset containing different subgroups of populations including Systemic Lupus Erythematosus (SLE) patients and healthy control subjects.
Availability and implementation: CODAK is implemented using R. The codes and the data used in this manuscript are available on the web at http://github.com/GhoshLab/CODAK/.
Contact: prudra@okstate.edu.
Supplementary information: Supplementary data are available at Bioinformatics Advances online.
© The Author(s) 2022. Published by Oxford University Press.
Figures







Similar articles
-
Gating mass cytometry data by deep learning.Bioinformatics. 2017 Nov 1;33(21):3423-3430. doi: 10.1093/bioinformatics/btx448. Bioinformatics. 2017. PMID: 29036374 Free PMC article.
-
Compositional analysis of microbiome data using the linear decomposition model (LDM).bioRxiv [Preprint]. 2023 May 29:2023.05.26.542540. doi: 10.1101/2023.05.26.542540. bioRxiv. 2023. PMID: 37398068 Free PMC article. Preprint.
-
SCANCell reveals diverse inter-cluster interaction patterns in systemic lupus erythematosus across the disease spectrum.Bioinformatics. 2022 Feb 7;38(5):1361-1368. doi: 10.1093/bioinformatics/btab713. Bioinformatics. 2022. PMID: 34664638
-
Transformation and differential abundance analysis of microbiome data incorporating phylogeny.Bioinformatics. 2021 Dec 11;37(24):4652-4660. doi: 10.1093/bioinformatics/btab543. Bioinformatics. 2021. PMID: 34302462
-
Understanding sequencing data as compositions: an outlook and review.Bioinformatics. 2018 Aug 15;34(16):2870-2878. doi: 10.1093/bioinformatics/bty175. Bioinformatics. 2018. PMID: 29608657 Free PMC article. Review.
Cited by
-
Expansion of extrafollicular B and T cell subsets in childhood-onset systemic lupus erythematosus.Front Immunol. 2023 Oct 27;14:1208282. doi: 10.3389/fimmu.2023.1208282. eCollection 2023. Front Immunol. 2023. PMID: 37965329 Free PMC article.
References
-
- Aitchison J. (1982) The statistical analysis of compositional data. J. R. Stat. Soc. B, 44, 139–160.
-
- Aitchison J. et al. (2000) Logratio analysis and compositional distance. Math. Geol., 32, 271–275.
-
- Anderson M.J. (2014) Permutational multivariate analysis of variance (PERMANOVA). Wiley Statsref, 1–15.
-
- Anderson M.J., Legendre P. (1999) An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. J. Stat. Comput. Simul., 62, 271–303.
Grants and funding
LinkOut - more resources
Full Text Sources