Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Nov 22:14:1250806.
doi: 10.3389/fmicb.2023.1250806. eCollection 2023.

A toolbox of machine learning software to support microbiome analysis

Affiliations
Review

A toolbox of machine learning software to support microbiome analysis

Laura Judith Marcos-Zambrano et al. Front Microbiol. .

Abstract

The human microbiome has become an area of intense research due to its potential impact on human health. However, the analysis and interpretation of this data have proven to be challenging due to its complexity and high dimensionality. Machine learning (ML) algorithms can process vast amounts of data to uncover informative patterns and relationships within the data, even with limited prior knowledge. Therefore, there has been a rapid growth in the development of software specifically designed for the analysis and interpretation of microbiome data using ML techniques. These software incorporate a wide range of ML algorithms for clustering, classification, regression, or feature selection, to identify microbial patterns and relationships within the data and generate predictive models. This rapid development with a constant need for new developments and integration of new features require efforts into compile, catalog and classify these tools to create infrastructures and services with easy, transparent, and trustable standards. Here we review the state-of-the-art for ML tools applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on ML based software and framework resources currently available for the analysis of microbiome data in humans. The aim is to support microbiologists and biomedical scientists to go deeper into specialized resources that integrate ML techniques and facilitate future benchmarking to create standards for the analysis of microbiome data. The software resources are organized based on the type of analysis they were developed for and the ML techniques they implement. A description of each software with examples of usage is provided including comments about pitfalls and lacks in the usage of software based on ML methods in relation to microbiome data that need to be considered by developers and users. This review represents an extensive compilation to date, offering valuable insights and guidance for researchers interested in leveraging ML approaches for microbiome analysis.

Keywords: data integration; feature analysis; feature generation; machine learning; microbial gene prediction; microbial metabolic modeling; microbiome; software.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
The relationship between the year of publication (left), programming lenguage (centre), and ML task (right) is depicted for the most commonly used software in microbiome analysis. The thickness of the line represents the quantity of software projects associated with a particular relationship (a project may have multiple relationships of given kind i.e., a software may be written in C and Python).
Figure 2
Figure 2
Comprehensive overview of the most commonly ML-based software applications employed in microbiome data analysis. These software tools are categorized based on their primary application into feature generation, feature analysis, and data integration. It is worth noting that numerous software options are applicable to both 16S rRNA gene sequencing data and shotgun metagenomics. Detailed descriptions of these software tools can be found in subsequent sections of the manuscript.

References

    1. Adapsyn Bioscience (2022). Available at: https://adapsyn.com/.
    1. Al-Ajlan A., El Allali A. (2019). CNN-MGP: convolutional neural networks for metagenomics gene prediction. Interdiscip. Sci. Comput. Life Sci. 11, 628–635. doi: 10.1007/s12539-018-0313-4, PMID: - DOI - PMC - PubMed
    1. Albanese D., Fontana P., de Filippo C., Cavalieri D., Donati C. (2015). MICCA: a complete and accurate software for taxonomic profiling of metagenomic data. Sci. Rep. 5:9743. doi: 10.1038/srep09743, PMID: - DOI - PMC - PubMed
    1. Alneberg J., Bjarnason B. S., de Bruijn I., Schirmer M., Quick J., Ijaz U. Z., et al. (2014). Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146. doi: 10.1038/nmeth.3103, PMID: - DOI - PubMed
    1. Arango-Argoty G., Garner E., Pruden A., Heath L. S., Vikesland P., Zhang L. (2018). DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome 6:23. doi: 10.1186/s40168-018-0401-z, PMID: - DOI - PMC - PubMed

LinkOut - more resources