Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 6;50(16):3272-3293.
doi: 10.1080/02664763.2022.2108007. eCollection 2023.

Three approaches to supervised learning for compositional data with pairwise logratios

Affiliations

Three approaches to supervised learning for compositional data with pairwise logratios

Germà Coenders et al. J Appl Stat. .

Abstract

Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in compositional data analysis, and include the well-known additive logratios as particular cases. When the number of parts is large (sometimes even larger than the number of cases), some form of logratio selection is needed. In this article, we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem. The first method features unrestricted search, where any pairwise logratio can be selected. This method has a complex interpretation if some pairs of parts in the logratios overlap, but it leads to the most accurate predictions. The second method restricts parts to occur only once, which makes the corresponding logratios intuitively interpretable. The third method uses additive logratios, so that K-1 selected logratios involve a K-part subcomposition. Our approach allows logratios or non-compositional covariates to be forced into the models based on theoretical knowledge, and various stopping criteria are available based on information measures or statistical significance with the Bonferroni correction. We present an application on a dataset from a study predicting Crohn's disease.

Keywords: Compositional data; generalized linear modelling; logratios; stepwise regression; variable selection.

PubMed Disclaimer

Conflict of interest statement

No potential conflict of interest was reported by the author(s).

Figures

Figure 1.
Figure 1.
Scree-type plots showing incremental amounts (black bars) at each step and cumulative amounts (gray bars) at each step of the three respective algorithms. The values are percentages of the maximum achievable deviance that can be accounted for by using a complete set of J−1 = 47 LRs in the logistic regression.
Figure 2.
Figure 2.
Directed acyclic graphs (DAGs) visualizing the ratios selected in the three stepwise approaches (according to the Bonferroni penalty, the right-hand panels in Tables 1–3). Arrows point from the denominator to the numerator in every case. In each graph, the LR at the top (Stre/Rose) is the first one selected and the ratios introduced in the following steps are shown in a clockwise direction. (a) Unrestricted search, showing an overlap of Stre; 15 parts included. (b) Restricted to non-overlap; 16 parts included. (c) ALR selection; 10 parts included, which define a subcomposition, and the only graph out of the three that is connected.
Figure 3.
Figure 3.
Estimated log-contrast coefficients (left) and their conversion to multiplicative effects and 95 % bootstrap confidence intervals (right).

References

    1. Aitchison J., The statistical analysis of compositional data (with discussion), J. R. Stat. Soc. Ser. B 44 (1982), pp. 139–177.
    1. Aitchison J., The Statistical Analysis of Compositional Data, Chapman & Hall, London, 1986.
    1. Aitchison J., The one-hour course in compositional data analysis, or compositional data analysis is simple, in Proceedings of IAMG'97, V. Pawlowsky-Glahn, ed., International Association for Mathematical Geology, 1997, pp. 3–35.
    1. Aitchison J. and Bacon-Shone J., Log contrast models for experiments with mixtures, Biometrika 71 (1984), pp. 323–330.
    1. Bates S. and Tibshirani R., Log-ratio lasso: Scalable, sparse estimation for log-ratio models, Biometrics 75 (2019), pp. 613–624. - PMC - PubMed

LinkOut - more resources