Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov;76(5):1555-1562.
doi: 10.1161/HYPERTENSIONAHA.120.15885. Epub 2020 Sep 10.

Machine Learning Strategy for Gut Microbiome-Based Diagnostic Screening of Cardiovascular Disease

Affiliations

Machine Learning Strategy for Gut Microbiome-Based Diagnostic Screening of Cardiovascular Disease

Sachin Aryal et al. Hypertension. 2020 Nov.

Abstract

Cardiovascular disease (CVD) is the number one leading cause for human mortality. Besides genetics and environmental factors, in recent years, gut microbiota has emerged as a new factor influencing CVD. Although cause-effect relationships are not clearly established, the reported associations between alterations in gut microbiota and CVD are prominent. Therefore, we hypothesized that machine learning (ML) could be used for gut microbiome-based diagnostic screening of CVD. To test our hypothesis, fecal 16S ribosomal RNA sequencing data of 478 CVD and 473 non-CVD human subjects collected through the American Gut Project were analyzed using 5 supervised ML algorithms including random forest, support vector machine, decision tree, elastic net, and neural networks. Thirty-nine differential bacterial taxa were identified between the CVD and non-CVD groups. ML modeling using these taxonomic features achieved a testing area under the receiver operating characteristic curve (0.0, perfect antidiscrimination; 0.5, random guessing; 1.0, perfect discrimination) of ≈0.58 (random forest and neural networks). Next, the ML models were trained with the top 500 high-variance features of operational taxonomic units, instead of bacterial taxa, and an improved testing area under the receiver operating characteristic curves of ≈0.65 (random forest) was achieved. Further, by limiting the selection to only the top 25 highly contributing operational taxonomic unit features, the area under the receiver operating characteristic curves was further significantly enhanced to ≈0.70. Overall, our study is the first to identify dysbiosis of gut microbiota in CVD patients as a group and apply this knowledge to develop a gut microbiome-based ML approach for diagnostic screening of CVD.

Keywords: artificial intelligence; cardiovascular disease; diagnosis; gut microbiome; machine learning; metagenomic sequencing.

PubMed Disclaimer

Conflict of interest statement

Disclosures: The authors declare no conflict of interest.

Figures

Figure 1.
Figure 1.. The study workflow.
(A) Overall analysis. (B) Supervised machine learning.
Figure 2.
Figure 2.. Differential bacterial taxa between the groups of cardiovascular disease (CVD) and non-CVD and performance measures of supervised machine learning models for classifying the CVD and non-CVD subjects using differential taxonomic features.
(A) Linear discriminant analysis effect size (LEfSe) bar graph showing differential bacterial taxa. (B) Cladogram showing phylogenetic relationships of differential bacterial taxa. (C) Area under the receiver operating characteristic curve (AUC). (D) Sensitivity. (E) Specificity. Each point in the box plot represents the corresponding performance measure in one iteration (total 50 iterations).
Figure 3.
Figure 3.. Performance measures of supervised machine learning models for classifying the cardiovascular disease (CVD) and non-CVD subjects using the top 500 high-variance operational taxonomic unit (OTU) features.
(A) Area under the receiver operating characteristic curve (AUC). (B) Sensitivity. (C) Specificity. Each point in the box plot represents the corresponding performance measure in one iteration (total 50 iterations).
Figure 4.
Figure 4.. Performance measures of the random forest (RF) model for classifying the cardiovascular disease (CVD) and non-CVD subjects using the top highly contributing operational taxonomic unit features (HCOFs).
(A) Variable importance scores (ranged from 0 to 100) of the top 100 HCOFs. (B) Area under the receiver operating characteristic curve (AUC). (C) Sensitivity. (D) Specificity. Each point in the box plot represents the corresponding performance measure in one iteration (total 50 iterations).

References

    1. Bonnefont-Rousselot D. Resveratrol and cardiovascular diseases. Nutrients. 2016;8(5):250. - PMC - PubMed
    1. Cheriyan J, O’Shaughnessy KM, Brown MJ. Primary prevention of CVD: treating hypertension. BMJ Clin Evid. 2010;2010. - PMC - PubMed
    1. Frostegård J. Immunity, atherosclerosis and cardiovascular disease. BMC Med. 2013;11(1):117. - PMC - PubMed
    1. Agmon Y, Khandheria BK, Meissner I, et al. Independent association of high blood pressure and aortic atherosclerosis: a population-based study. Circulation. 2000;102(17):2087–2093. - PubMed
    1. Guglin M, Khan H. Pulmonary hypertension in heart failure. J Card Fail. 2010;16(6):461–474. - PubMed

Publication types