Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Feb:71:102326.
doi: 10.1016/j.pbi.2022.102326. Epub 2022 Dec 18.

Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies

Affiliations
Review

Compositionality, sparsity, spurious heterogeneity, and other data-driven challenges for machine learning algorithms within plant microbiome studies

Sebastiano Busato et al. Curr Opin Plant Biol. 2023 Feb.

Abstract

The plant-associated microbiome is a key component of plant systems, contributing to their health, growth, and productivity. The application of machine learning (ML) in this field promises to help untangle the relationships involved. However, measurements of microbial communities by high-throughput sequencing pose challenges for ML. Noise from low sample sizes, soil heterogeneity, and technical factors can impact the performance of ML. Additionally, the compositional and sparse nature of these datasets can impact the predictive accuracy of ML. We review recent literature from plant studies to illustrate that these properties often go unmentioned. We expand our analysis to other fields to quantify the degree to which mitigation approaches improve the performance of ML and describe the mathematical basis for this. With the advent of accessible analytical packages for microbiome data including learning models, researchers must be familiar with the nature of their datasets.

Keywords: Compositional data analysis; Deep learning; Machine learning; Plant-associated microbiome.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests. Cranos Williams reports financial support was provided by Novo Nordisk Inc. Max Gordon reports financial support was provided by National Science Foundation. Sebastiano Busato reports financial support was provided by National Institute of Health. Stig Andersen reports financial support was provided by Novo Nordisk Inc. Meenal Chaudhari reports financial support was provided by Novo Nordisk Inc. Ib Jensen reports financial support was provided by Novo Nordisk Inc. Turgut Akyol reports financial support was provided by Novo Nordisk Inc.

Figures

Figure 1.
Figure 1.. The emergence of compositionality and sparsity in high-throughput sequencing-based microbiome studies.
We present a hypothetical comparison of two microbial populations with distinct absolute abundances (all values are arbitrary units). Investigators collect and sequence one sample per population, which results in some species with low abundance (Yellow Pentagon in Population 2) to be excluded from the collected sample, leading to a value of zero for said community. Unequal sequencing depths lead to non-quantification of other species (Green Hexagon in Population 2) uniquely due to the lack of corresponding sequencing counts. Finally, the compositional nature of the experimental setup and the resulting dataset leads to observed changes caused by a difference in relative abundance only, leading to bias in differential abundance compared to absolute changes.
Figure 2.
Figure 2.. Typical Steps in Preparing Microbiome Sequencing Data for ML.
We show a common sequence of processes done to prepare microbiome sequencing data for use in machine learning models, with the stages of this preparatory process where the impacts of compositionally need to be accounted for and may be mitigated through selection of appropriate techniques.

References

    1. Simon J-C, Marchesi JR, Mougel C, Selosse M-A: Host-microbiota interactions: from holobiont theory to analysis. Microbiome 2019, 7:5. - PMC - PubMed
    1. Whipps JM, Lewis K, Cooke R: Mycoparasitism and plant disease control. Fungi in biological control systems 1988,
    1. Berg G, Rybakova D, Fischer D, Cernava T, Vergès M- CC, Charles T, Chen X, Cocolin L, Eversole K, Corral GH, et al. : Microbiome definition re-visited: old concepts and new challenges. Microbiome 2020, 8:103. - PMC - PubMed
    1. Vandenkoornhuyse P, Quaiser A, Duhamel M, Le Van A, Dufresne A: The importance of the microbiome of the plant holobiont. New Phytologist 2015, 206:1196–1206. - PubMed
    1. Berg G, Rybakova D, Grube M, Köberl M: The plant microbiome explored: implications for experimental botany. J Exp Bot 2016, 67:995–1002. - PMC - PubMed

Publication types

LinkOut - more resources