Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 25:12:628426.
doi: 10.3389/fmicb.2021.628426. eCollection 2021.

Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods

Affiliations

Discovering Potential Taxonomic Biomarkers of Type 2 Diabetes From Human Gut Microbiota via Different Feature Selection Methods

Burcu Bakir-Gungor et al. Front Microbiol. .

Abstract

Human gut microbiota is a complex community of organisms including trillions of bacteria. While these microorganisms are considered as essential regulators of our immune system, some of them can cause several diseases. In recent years, next-generation sequencing technologies accelerated the discovery of human gut microbiota. In this respect, the use of machine learning techniques became popular to analyze disease-associated metagenomics datasets. Type 2 diabetes (T2D) is a chronic disease and affects millions of people around the world. Since the early diagnosis in T2D is important for effective treatment, there is an utmost need to develop a classification technique that can accelerate T2D diagnosis. In this study, using T2D-associated metagenomics data, we aim to develop a classification model to facilitate T2D diagnosis and to discover T2D-associated biomarkers. The sequencing data of T2D patients and healthy individuals were taken from a metagenome-wide association study and categorized into disease states. The sequencing reads were assigned to taxa, and the identified species are used to train and test our model. To deal with the high dimensionality of features, we applied robust feature selection algorithms such as Conditional Mutual Information Maximization, Maximum Relevance and Minimum Redundancy, Correlation Based Feature Selection, and select K best approach. To test the performance of the classification based on the features that are selected by different methods, we used random forest classifier with 100-fold Monte Carlo cross-validation. In our experiments, we observed that 15 commonly selected features have a considerable effect in terms of minimizing the microbiota used for the diagnosis of T2D and thus reducing the time and cost. When we perform biological validation of these identified species, we found that some of them are known as related to T2D development mechanisms and we identified additional species as potential biomarkers. Additionally, we attempted to find the subgroups of T2D patients using k-means clustering. In summary, this study utilizes several supervised and unsupervised machine learning algorithms to increase the diagnostic accuracy of T2D, investigates potential biomarkers of T2D, and finds out which subset of microbiota is more informative than other taxa by applying state-of-the art feature selection methods.

Keywords: classification; feature selection; human gut microbiome; machine learning; metagenomic analysis; type 2 diabetes.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Flowchart of our method, including three main parts. (i) Feature selection methods are applied to detect the most important species for the development of T2D (T2D-associated microorganisms). (ii) Using the selected features, models are constructed and used for classification. (iii) K-means clustering algorithm is applied on data to specify subgroups of patients and control samples.
FIGURE 2
FIGURE 2
Numbers of features, which are selected by different feature selection algorithms. The commonalities between the selected features by different methods are also illustrated.
FIGURE 3
FIGURE 3
Comparative evaluation of different feature selection methods based on (A) ROC area, (B) accuracy, and (C) F-measure metrics.
FIGURE 4
FIGURE 4
Pairwise correlation heat map of 15 commonly identified features. While number 1 (shown in yellow) indicates full correlation, number 0 (shown in dark blue) indicates no correlation.
FIGURE 5
FIGURE 5
The relative amounts of 15 species (A) in all healthy and T2D subgroups. (B) Zoomed-in view of all healthy subgroups and one T2D subgroup, which covers more than 86% of all samples.
FIGURE 6
FIGURE 6
Zoomed-in view of all healthy subgroups and the biggest T2D subgroup for (A) Eggerthella lenta, (B) Bacteroides stercoris, (C) Bacteroides vulgatus, and (D) Subdoligranulum.

References

    1. Albarracin C. A., Fuqua B. C., Evans J. L., Goldfine I. D. (2008). Chromium picolinate and biotin combination improves glucose metabolism in treated, un-controlled overweight to obese patients with type 2 diabetes. Diabetes Metab. Res. Rev. 24 41–51. 10.1002/dmrr.755 - DOI - PubMed
    1. Allin K. H., Tremaroli V., Caesar R., Jensen B. A. H., Damgaard M. T. F., Bahl M. I., et al. (2018). Aberrant intestinal microbiota in individuals with prediabetes. Diabetologia 61 810–820. 10.1007/s00125-018-4550-1 - DOI - PMC - PubMed
    1. Aoki R. (2017). A proliferative probiotic bifidobacterium strain in the gut ameliorates progression of metabolic disorders via microbiota modulation and acetate elevation. Sci. Rep. 7:43522. - PMC - PubMed
    1. Aw W., Fukuda S. (2018). Understanding the role of the gut ecosystem in diabetes mellitus. J. Diabetes Investig. 9 5–12. 10.1111/jdi.12673 - DOI - PMC - PubMed
    1. Berthold M. R., Cebron N., Dill F., Gabriel T. R., Kötter T., Meinl T., et al. (2008). “KNIME: The Konstanz Information Miner,” in Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization, eds Preisach C., Burkhardt H., Schmidt-Thieme L., Decker R. (Berlin: Springer; ), 319–326. 10.1007/978-3-540-78246-9_38 - DOI

LinkOut - more resources