A comparative study of supervised and unsupervised machine learning algorithms applied to human microbiome
- PMID: 38767067
- DOI: 10.7417/CT.2024.5051
A comparative study of supervised and unsupervised machine learning algorithms applied to human microbiome
Abstract
Background: The human microbiome, consisting of diverse bacte-rial, fungal, protozoan and viral species, exerts a profound influence on various physiological processes and disease susceptibility. However, the complexity of microbiome data has presented significant challenges in the analysis and interpretation of these intricate datasets, leading to the development of specialized software that employs machine learning algorithms for these aims.
Methods: In this paper, we analyze raw data taken from 16S rRNA gene sequencing from three studies, including stool samples from healthy control, patients with adenoma, and patients with colorectal cancer. Firstly, we use network-based methods to reduce dimensions of the dataset and consider only the most important features. In addition, we employ supervised machine learning algorithms to make prediction.
Results: Results show that graph-based techniques reduces dimen-sion from 255 up to 78 features with modularity score 0.73 based on different centrality measures. On the other hand, projection methods (non-negative matrix factorization and principal component analysis) reduce dimensions to 7 features. Furthermore, we apply supervised machine learning algorithms on the most important features obtained from centrality measures and on the ones obtained from projection methods, founding that the evaluation metrics have approximately the same scores when applying the algorithms on the entire dataset, on 78 feature and on 7 features.
Conclusions: This study demonstrates the efficacy of graph-based and projection methods in the interpretation for 16S rRNA gene sequencing data. Supervised machine learning on refined features from both approaches yields comparable predictive performance, emphasizing specific microbial features-bacteroides, prevotella, fusobacterium, lysinibacillus, blautia, sphingomonas, and faecalibacterium-as key in predicting patient conditions from raw data.
Keywords: Complex networks; complexity; machine learning; modularity; non-negative matrix factorization.
Similar articles
-
Robust prediction of colorectal cancer via gut microbiome 16S rRNA sequencing data.J Med Microbiol. 2024 Oct;73(10). doi: 10.1099/jmm.0.001903. J Med Microbiol. 2024. PMID: 39377779
-
Using gut microbiota as a diagnostic tool for colorectal cancer: machine learning techniques reveal promising results.J Med Microbiol. 2023 Jun;72(6). doi: 10.1099/jmm.0.001699. J Med Microbiol. 2023. PMID: 37288545
-
Learning Microbial Community Structures with Supervised and Unsupervised Non-negative Matrix Factorization.Microbiome. 2017 Aug 31;5(1):110. doi: 10.1186/s40168-017-0323-1. Microbiome. 2017. PMID: 28859695 Free PMC article.
-
The Utility of Unsupervised Machine Learning in Anatomic Pathology.Am J Clin Pathol. 2022 Jan 6;157(1):5-14. doi: 10.1093/ajcp/aqab085. Am J Clin Pathol. 2022. PMID: 34302331 Review.
-
Supervised and unsupervised algorithms for bioinformatics and data science.Prog Biophys Mol Biol. 2020 Mar;151:14-22. doi: 10.1016/j.pbiomolbio.2019.11.012. Epub 2019 Dec 6. Prog Biophys Mol Biol. 2020. PMID: 31816343 Review.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources