Three approaches to supervised learning for compositional data with pairwise logratios

Germà Coenders¹, Michael Greenacre²

Affiliations

¹ Department of Economics, Universitat de Girona, Girona, Spain.
² Department of Economics and Business and Barcelona School of Management, Universitat Pompeu Fabra, Barcelona, Spain.

PMID: 37969895
PMCID: PMC10637191
DOI: 10.1080/02664763.2022.2108007

Three approaches to supervised learning for compositional data with pairwise logratios

Germà Coenders et al. J Appl Stat. 2022.

. 2022 Aug 6;50(16):3272-3293.

doi: 10.1080/02664763.2022.2108007. eCollection 2023.

Authors

Germà Coenders¹, Michael Greenacre²

Affiliations

¹ Department of Economics, Universitat de Girona, Girona, Spain.
² Department of Economics and Business and Barcelona School of Management, Universitat Pompeu Fabra, Barcelona, Spain.

PMID: 37969895
PMCID: PMC10637191
DOI: 10.1080/02664763.2022.2108007

Abstract

Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in compositional data analysis, and include the well-known additive logratios as particular cases. When the number of parts is large (sometimes even larger than the number of cases), some form of logratio selection is needed. In this article, we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem. The first method features unrestricted search, where any pairwise logratio can be selected. This method has a complex interpretation if some pairs of parts in the logratios overlap, but it leads to the most accurate predictions. The second method restricts parts to occur only once, which makes the corresponding logratios intuitively interpretable. The third method uses additive logratios, so that K-1 selected logratios involve a K-part subcomposition. Our approach allows logratios or non-compositional covariates to be forced into the models based on theoretical knowledge, and various stopping criteria are available based on information measures or statistical significance with the Bonferroni correction. We present an application on a dataset from a study predicting Crohn's disease.

Keywords: Compositional data; generalized linear modelling; logratios; stepwise regression; variable selection.

PubMed Disclaimer

Conflict of interest statement

No potential conflict of interest was reported by the author(s).

Figures

**Figure 1.**
Scree-type plots showing incremental amounts (black bars) at each step and cumulative amounts (gray bars) at each step of the three respective algorithms. The values are percentages of the maximum achievable deviance that can be accounted for by using a complete set of J−1 = 47 LRs in the logistic regression.

**Figure 2.**
Directed acyclic graphs (DAGs) visualizing the ratios selected in the three stepwise approaches (according to the Bonferroni penalty, the right-hand panels in Tables 1–3). Arrows point from the denominator to the numerator in every case. In each graph, the LR at the top (Stre/Rose) is the first one selected and the ratios introduced in the following steps are shown in a clockwise direction. (a) Unrestricted search, showing an overlap of Stre; 15 parts included. (b) Restricted to non-overlap; 16 parts included. (c) ALR selection; 10 parts included, which define a subcomposition, and the only graph out of the three that is connected.

**Figure 3.**
Estimated log-contrast coefficients (left) and their conversion to multiplicative effects and 95 % bootstrap confidence intervals (right).

See this image and copyright information in PMC

References

1. Aitchison J., The statistical analysis of compositional data (with discussion), J. R. Stat. Soc. Ser. B 44 (1982), pp. 139–177.
1. Aitchison J., The Statistical Analysis of Compositional Data, Chapman & Hall, London, 1986.
1. Aitchison J., The one-hour course in compositional data analysis, or compositional data analysis is simple, in Proceedings of IAMG'97, V. Pawlowsky-Glahn, ed., International Association for Mathematical Geology, 1997, pp. 3–35.
1. Aitchison J. and Bacon-Shone J., Log contrast models for experiments with mixtures, Biometrika 71 (1984), pp. 323–330.
1. Bates S. and Tibshirani R., Log-ratio lasso: Scalable, sparse estimation for log-ratio models, Biometrics 75 (2019), pp. 613–624. - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Three approaches to supervised learning for compositional data with pairwise logratios

Affiliations

Three approaches to supervised learning for compositional data with pairwise logratios

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources