Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul;29(4):453-469.
doi: 10.29220/csam.2022.29.4.453. Epub 2022 Jul 31.

A guideline for the statistical analysis of compositional data in immunology

Affiliations

A guideline for the statistical analysis of compositional data in immunology

Jinkyung Yoo et al. Commun Stat Appl Methods. 2022 Jul.

Abstract

The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.

Keywords: Dirichlet regression; compositional data; compositional regression; immuno-oncology; immunology; log-ratio transformation.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Immune Landscape of Cancer data and immune cell type composition within the tumor microenvironment of each cancer patient.
Figure 2:
Figure 2:
(Left) Weighted log-ratio analysis of the immune cell compositional dataset, showing the contribution biplot. The African American samples are indicated by a cross, and the two ellipses are 95% confidence ellipses for the group means, with the African American group on the right. (Right) The discriminant version where the first dimension coincides with the group difference.
Figure 3:
Figure 3:
95% confidence plots of the log-ratio means of the pairwise log-ratio on the left and the summated log-ratio on the right. The dot indicates the mean, the box indicates the 50% confidence interval and the whiskers extend to the 95% confidence interval.
Figure 4:
Figure 4:
Estimates of log-contrast coefficients (exponentiated) for each immune cell type, along with 95% bootstrap confidence intervals and p-values. Race is coded as a dummy variable for African American.
Figure 5:
Figure 5:
Estimates of log-contrast coefficients (exponentiated) for the four-part subcompositional repsonse modelled on the discrete predictor race, with a dummy variable for the category African American, along with 95% bootstrap confidence intervals and p-values.
Figure 6:
Figure 6:
Componentwise plots of the local influence measures against compositional values based on the Dirichlet model with the race variable.
Figure 7:
Figure 7:
Composite residual plot for Dirichlet regression model with race.
Figure 8:
Figure 8:
Componentwise plots of the overdispersion statistic against compositional values based on the Dirichlet model fitted with the race variable. The red marked points indicate the observation with the largest overdispersion statistic value in each cell type.

References

    1. Aitchison J (1982). The statistical analysis of compositional data, Journal of the Royal Statistical Society: Series B (Methodological), 44, 139–160.
    1. Aitchison J (1986). Logratio analysis of composition, In The Statistical Analysis of Compositional Data (pp. 141–183), London: Champman & Hall.
    1. Aitchison J and Greenacre M (2002). Biplots of compositional data, Journal of the Royal Statistical Society: Series C (Applied Statistics), 51, 375–392.
    1. Camargo AP, Stern JM, and Lauretto MS (2012). Estimation and model selection in Dirichlet regression, AIP Conference Proceedings 31st, 1443, 206–213.
    1. Campbell G and Mosimann J (1987). Multivariate methods for proportional shape, ASA Proceedings of the Section on Statistical Graphics, 1, 10–17.

LinkOut - more resources