Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 25;16(1):2915.
doi: 10.1038/s41467-025-58283-7.

External validation of artificial intelligence for detection of heart failure with preserved ejection fraction

Affiliations

External validation of artificial intelligence for detection of heart failure with preserved ejection fraction

Ashley P Akerman et al. Nat Commun. .

Erratum in

Abstract

Artificial intelligence (AI) models to identify heart failure (HF) with preserved ejection fraction (HFpEF) based on deep-learning of echocardiograms could help address under-recognition in clinical practice, but they require extensive validation, particularly in representative and complex clinical cohorts for which they could provide most value. In this study enrolling patients with HFpEF (cases; n = 240), and age, sex, and year of echocardiogram matched controls (n = 256), we compare the diagnostic performance (discrimination, calibration, classification, and clinical utility) and prognostic associations (mortality and HF hospitalization) between an updated AI HFpEF model (EchoGo Heart Failure v2) and existing clinical scores (H2FPEF and HFA-PEFF). The AI HFpEF model and H2FPEF score demonstrate similar discrimination and calibration, but classification is higher with AI than H2FPEF and HFA-PEFF, attributable to fewer intermediate scores, due to discordant multivariable inputs. The continuous AI HFpEF model output adds information beyond the H2FPEF, and integration with existing scores increases correct management decisions. Those with a diagnostic positive result from AI have a two-fold increased risk of the composite outcome. We conclude that integrating an AI HFpEF model into the existing clinical diagnostic pathway would improve identification of HFpEF in complex clinical cohorts, and patients at risk of adverse outcomes.

PubMed Disclaimer

Conflict of interest statement

Competing interests: J.B.S. reports investigator-initiated funding from Ultromics for the current study. J.B.S. reports research grants (to the institution) from Anumana, Philips Healthcare, EVERSANA Lifesciences, and Bracco Diagnostics; consulting for Bracco Diagnostics, Edwards Lifesciences, Philips Healthcare, General Electric Healthcare, and EVERSANA Lifesciences, and is a member of the scientific advisory boards for Ultromics, HeartSciences, Bristol Myers Squibb, Alnyam, and EchoIQ, and the data safety monitoring board for Pfizer. A.P.A., W.H., H.P., P.L., R.U., and G.W. are employees of Ultromics. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Flowchart demonstrating study inclusion and exclusions.
Displayed is a flowchart depicting the study selection for cases and controls based on inclusion and exclusion criteria. BIDMC Beth Israel Deaconess Medical Center, LVEF left ventricular ejection fraction, HF heart failure, TTE transthoracic echocardiogram.
Fig. 2
Fig. 2. Receiver operating characteristic curve comparing discrimination for identification of heart failure with preserved ejection fraction (HFpEF) using artificial intelligence (AI) HFpEF model versus the H2FPEF score.
Shown are the receiver operating characteristic (ROC) curves comparing discrimination for identification of HFpEF using AI HFpEF model vs. the H2FPEF score. AI HFpEF is in blue (area under the curve of the ROC [AUROC]: 0.798, [95% CI 0.756–0.799]), and the H2FPEF score is in orange (0.788, ([0.745–0.789])). The difference between the two was not significant (mean difference in AUROC, 0.01, [–0.043–0.064], p = 0.710 using a two-sided DeLong test).
Fig. 3
Fig. 3. Alluvial plots demonstrating reclassification of predicted heart failure with preserved ejection fraction (HFpEF) from clinical scores using the artificial intelligence (AI) HFpEF model.
Displayed are alluvial plots depicting the reclassification of predicted HFpEF from clinical scores using the AI HFpEF model. This plot and associated reclassification statistics account for only categorical classification outputs from each model, rather than continuous outputs. Panel A depicts the AI model’s reclassification of an individual’s predicted HFpEF status from the HF2FPEF score. Panel B depicts the AI model’s reclassification of an individual’s predicted HFpEF status from the HFA-PEFF score. The added value of the AI HFpEF model compared with the H2FPEF score and HFA-PEFF score are displayed below the alluvial plots. Two-sided likelihood ratio tests were used to estimate the added value of the AI HFpEF vs. H2FpEF score and resulting p values are presented alongside net reclassification improvement (NRI) statistics. All NRI statistics are based on categorical outputs. Non-diagnostic and indeterminate outputs are referred to as “intermediate” for consistency and clarity.
Fig. 4
Fig. 4. Decision curves demonstrating standardized net benefit and net reduction in interventions of using the H2FPEF and AI HFpEF model in combination versus separate approaches in patients suspected of having HFpEF.
Decision curves comparing the standardized net benefit (panel A) and net reduction in interventions (B) when patient management decisions are based on the output of the H2FPEF score and/or the AI HFpEF model. The management decision is assumed to represent prescription of SGLT2i to the patient, in a population where the expected prevalence of HFpEF is 30%. Red and gold lines represent clinical baselines of prescribing all patients or no patients SGLT2i, respectively, regardless of the output of any test. Prescribing based on only the H2FPEF score (green), represents that any patient with a “Probable” classification of HFpEF would be prescribed SGLT2i. Prescribing based on either a “Probable” (H2FPEF) or “Positive” classification (AI HFpEF) is presented in blue. Prescribing based on the combination of a “Positive” classification (AI HFpEF), or “Intermediate” (AI HFpEF) and “Probable” (H2FPEF) is presented in purple. The x-axis represents the threshold probability that would be required by a clinician and/or patient to initiate prescription of SGLT2i. In this context, the chosen minimum threshold probability is 30% (dotted line), representing the relative harm of an adverse event when taking SGLT2i (5.8%), and the risk reduction associated with taking an SGLT2i (−19% for HF hospitalization or worsening HF event). The x-axis is truncated to clinically reasonable threshold probabilities for clear and meaningful interpretation. For net benefit plots, the y-axis refers to the standardized net benefit of taking a given approach, with units presenting the proportions of patients with disease in the population who would be successfully managed according to the different approaches. For example, a value of 0.45 for the Combined approach would be interpreted such that, compared to prescribing no patients with SGLT2i, managing patients based on the combined information from the AI HFpEF model and the H2FPEF score would result in 45% of patients with HFpEF being correctly managed. For net reduction in interventions, a value of 0.315 for the Combined approach would be interpreted such that, prescribing SGLT2i based on the combined information would lead to an absolute 31.5% reduction in the number of prescriptions without missing any patients with HFpEF.
Fig. 5
Fig. 5. Kaplan-Meier curve demonstrating time to the composite endpoint by predicted classification according to the artificial intelligence (AI) heart failure with preserved ejection fraction (HFpEF) model.
Shown are Kaplan-Meier curves for time (in months) from the index echocardiogram to the composite outcome of death or heart failure hospitalization according to the AI HFpEF model’s predicted classification. Red = diagnostic negative, green = intermediate (“non-diagnostic” due to high uncertainty), and blue = diagnostic positive. Number of individuals in the risk set at 5-month time intervals is provided below the x-axis.

References

    1. Martin, S. S. et al. 2024 Heart Disease and Stroke Statistics: A Report of US and Global Data From the American Heart Association. Circulation149, e347–e913 (2024). - PubMed
    1. Bozkurt, B. et al. HF STATS 2024: Heart Failure Epidemiology and Outcomes Statistics An Updated 2024 Report from the Heart Failure Society of America. Journal of cardiac failure10.1016/j.cardfail.2024.07.001 (2024). - PubMed
    1. Oktay, A. A., Rich, J. D. & Shah, S. J. The emerging epidemic of heart failure with preserved ejection fraction. Curr. Heart Fail Rep.10, 401 (2013). - PMC - PubMed
    1. Reeves, G. R. et al. Comparison of frequency of frailty and severely impaired physical function in patients ≥60 years hospitalized with acute decompensated heart failure versus chronic stable heart failure with reduced and preserved left ventricular ejection fraction. Am. J. Cardiol.117, 1953–1958 (2016). - PMC - PubMed
    1. Warraich, H. J. et al. Physical function, frailty, cognition, depression, and quality of life in hospitalized adults ≥60 years with acute decompensated heart failure with preserved versus reduced ejection fraction. Circ. Heart Fail11, e005254 (2018). - PMC - PubMed

Publication types