Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Oct 12:2023.10.12.23296829.
doi: 10.1101/2023.10.12.23296829.

Microbiome-based risk prediction in incident heart failure: a community challenge

Affiliations

Microbiome-based risk prediction in incident heart failure: a community challenge

Pande Putu Erawijantari et al. medRxiv. .

Abstract

Heart failure (HF) is a major public health problem. Early identification of at-risk individuals could allow for interventions that reduce morbidity or mortality. The community-based FINRISK Microbiome DREAM challenge (synapse.org/finrisk) evaluated the use of machine learning approaches on shotgun metagenomics data obtained from fecal samples to predict incident HF risk over 15 years in a population cohort of 7231 Finnish adults (FINRISK 2002, n=559 incident HF cases). Challenge participants used synthetic data for model training and testing. Final models submitted by seven teams were evaluated in the real data. The two highest-scoring models were both based on Cox regression but used different feature selection approaches. We aggregated their predictions to create an ensemble model. Additionally, we refined the models after the DREAM challenge by eliminating phylum information. Models were also evaluated at intermediate timepoints and they predicted 10-year incident HF more accurately than models for 5- or 15-year incidence. We found that bacterial species, especially those linked to inflammation, are predictive of incident HF. This highlights the role of the gut microbiome as a potential driver of inflammation in HF pathophysiology. Our results provide insights into potential modeling strategies of microbiome data in prospective cohort studies. Overall, this study provides evidence that incorporating microbiome information into incident risk models can provide important biological insights into the pathogenesis of HF.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Illumina, Inc., and Janssen Pharmaceutica provided additional support by sponsoring the Center for Microbiome Innovation at the University of California San Diego. T.N. has received honoraria for speaking engagements from Servier and AstraZeneca. V.S. has had research collaboration with Bayer AG, unrelated to this study. J.S.-R. has received funding from GSK, Pfizer and Sanofi, and fees/honoraria from Travere Therapeutics, Stadapharm, Astex, Pfizer and Grunenthal. M.I. is a trustee of the Public Health Genomics (PHG) Foundation, a member of the Scientific Advisory Board of Open Targets, and has a research collaboration with AstraZeneca unrelated to this study. R.K. is a cofounder of Micronoma and Biota, holding stock for Gencirq, Cybele, Biomesense, Micronoma, and Biota, serve as a member of the Scientific Advisory Board in Gencirq, DayTwo, Biomesense, and Micronoma and serve as consultant for DayTwo, Cybele, and Biomesense.

Figures

Figure 1.
Figure 1.
Overview of the DREAM Challenge and FINRISK data. A. Geographical distribution across Finland for the individuals within the national FINRISK 2002 cohort. B. Principal Coordinate Analysis (PCoA) using Bray- Curtis dissimilarity metrics between randomly selected subsets of the data (training, testing, scoring sets). C. The setup and timeline of the DREAM Challenge including submission and scoring phases.
Figure 2.
Figure 2.
Harrell’s C and Hosmer-Lemeshow test A. Harrell’s C-index and Hosmer-Lemeshow p-value were obtained for the investigated models, including the three baseline models provided by the organizers in the scoring phase. B-C. Harrell’s C-index and Hosmer-Lemeshow empirical p-value on 1000 bootstrapped iterations for all the models. We used blue for SB2, orange for DFH and purple for the baseline models. D. Selected features in the baseline and top models. *Taxonomic features in the “Baseline All” model are presented in Supp. Table 6. ** The features and modules selected by DFH model were weight-based from 10 different seeds. Features present in each model are represented by turquoise-filled squares, while absence is indicated by blank squares.
Figure 3.
Figure 3.
Schematic illustration of modeling workflow of the two top-performing teams. A. Team DFH used modular Elastic Net regularized Cox proportional hazards model. After manually curating interpretable modules, they identified the optimal features within each module by module-specific cross-validation. The pruned modules were then combined and used to identify the best overall combination of features using cross-validation. The team averaged final risk predictions across multiple seeds. B. The SB2 team used LASSO regularization to retain 29 features encompassing age, BMI, systolic blood pressure, non-HDL cholesterol, sex, and dysbiosis as unpenalized features, and blood pressure treatment, prevalent diabetes, smoking and prevalent coronary heart disease were penalized and selected by LASSO to be included in the final Cox proportional hazards model.
Figure 4.
Figure 4.
A.Harrell’s C-index and B. Hosmer-Lemeshow p-value for the ensemble models from mean-aggregations of the final model’s individual risk score. The lower plot illustrates the combination of teams utilized in the calculation of the mean for the aggregated final models. The dashed line corresponds to p-value=0.05 on the y-axis (B), while the x-axis represents different combinations of ensemble models.
Figure 5.
Figure 5.
Evaluation of model performance over varying follow-up times. A. Harrell’s C-index and B. Hosmer-Lemeshow p-values are presented for different models, distinguished by unique colors, across three distinct follow-up periods: 5, 10, and 15 years. Two distinct HF definitions were represented in different shapes.

References

    1. Vaduganathan M., Mensah G. A., Turco J. V., Fuster V. & Roth G. A. The Global Burden of Cardiovascular Diseases and Risk: A Compass for Future Health. J. Am. Coll. Cardiol. 80, 2361–2371 (2022). - PubMed
    1. Savarese G. et al. Global burden of heart failure: a comprehensive and updated review of epidemiology. Cardiovasc. Res. 118, 3272–3287 (2023). - PubMed
    1. Sandhu A. T. et al. Disparity in the Setting of Incident Heart Failure Diagnosis. Circ. Heart Fail. (2021) doi: 10.1161/CIRCHEARTFAILURE.121.008538. - DOI - PMC - PubMed
    1. Bayes-Genis A. et al. Omics phenotyping in heart failure: the next frontier. Eur. Heart J. 41, 3477–3484 (2020). - PubMed
    1. Chandramouli C., Stewart S., Almahmeed W. & Lam C. S. P. Clinical implications of the universal definition for the prevention and treatment of heart failure. Clin. Cardiol. 45 Suppl 1, S2–S12 (2022). - PMC - PubMed

Publication types