Vision-language foundation model for echocardiogram interpretation
- PMID: 38689062
- PMCID: PMC11108770
- DOI: 10.1038/s41591-024-02959-y
Vision-language foundation model for echocardiogram interpretation
Abstract
The development of robust artificial intelligence models for echocardiography has been limited by the availability of annotated clinical data. Here, to address this challenge and improve the performance of cardiac imaging models, we developed EchoCLIP, a vision-language foundation model for echocardiography, that learns the relationship between cardiac ultrasound images and the interpretations of expert cardiologists across a wide range of patients and indications for imaging. After training on 1,032,975 cardiac ultrasound videos and corresponding expert text, EchoCLIP performs well on a diverse range of benchmarks for cardiac image interpretation, despite not having been explicitly trained for individual interpretation tasks. EchoCLIP can assess cardiac function (mean absolute error of 7.1% when predicting left ventricular ejection fraction in an external validation dataset) and identify implanted intracardiac devices (area under the curve (AUC) of 0.84, 0.92 and 0.97 for pacemakers, percutaneous mitral valve repair and artificial aortic valves, respectively). We also developed a long-context variant (EchoCLIP-R) using a custom tokenizer based on common echocardiography concepts. EchoCLIP-R accurately identified unique patients across multiple videos (AUC of 0.86), identified clinical transitions such as heart transplants (AUC of 0.79) and cardiac surgery (AUC 0.77) and enabled robust image-to-text search (mean cross-modal retrieval rank in the top 1% of candidate text reports). These capabilities represent a substantial step toward understanding and applying foundation models in cardiovascular imaging for preliminary interpretation of echocardiographic findings.
© 2024. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures





Similar articles
-
Artificial intelligence in cardiovascular imaging and intervention.Herz. 2024 Oct;49(5):327-334. doi: 10.1007/s00059-024-05264-z. Epub 2024 Aug 9. Herz. 2024. PMID: 39120735 Review. English.
-
Automated interpretation of systolic and diastolic function on the echocardiogram: a multicohort study.Lancet Digit Health. 2022 Jan;4(1):e46-e54. doi: 10.1016/S2589-7500(21)00235-1. Epub 2021 Dec 1. Lancet Digit Health. 2022. PMID: 34863649
-
Assessment of Artificial Intelligence in Echocardiography Diagnostics in Differentiating Takotsubo Syndrome From Myocardial Infarction.JAMA Cardiol. 2022 May 1;7(5):494-503. doi: 10.1001/jamacardio.2022.0183. JAMA Cardiol. 2022. PMID: 35353118 Free PMC article.
-
[Recognition of abnormal changes in echocardiographic videos by an artificial intelligence assisted diagnosis model based on 3D CNN].Zhonghua Xin Xue Guan Bing Za Zhi. 2023 Jul 24;51(7):750-758. doi: 10.3760/cma.j.cn112148-20230202-00058. Zhonghua Xin Xue Guan Bing Za Zhi. 2023. PMID: 37460429 Chinese.
-
European Association of Cardiovascular Imaging/Cardiovascular Imaging Department of the Brazilian Society of Cardiology recommendations for the use of cardiac imaging to assess and follow patients after heart transplantation.Eur Heart J Cardiovasc Imaging. 2015 Sep;16(9):919-48. doi: 10.1093/ehjci/jev139. Epub 2015 Jul 2. Eur Heart J Cardiovasc Imaging. 2015. PMID: 26139361 Review.
Cited by
-
Self-supervised learning for label-free segmentation in cardiac ultrasound.Nat Commun. 2025 Apr 30;16(1):4070. doi: 10.1038/s41467-025-59451-5. Nat Commun. 2025. PMID: 40307208 Free PMC article.
-
An open AI model could help medical experts to interpret chest X-rays.Nature. 2025 Jul;643(8071):340-341. doi: 10.1038/d41586-025-01525-x. Nature. 2025. PMID: 40500362 No abstract available.
-
Artificial intelligence in cardiovascular imaging and intervention.Herz. 2024 Oct;49(5):327-334. doi: 10.1007/s00059-024-05264-z. Epub 2024 Aug 9. Herz. 2024. PMID: 39120735 Review. English.
-
Echocardiographic video-driven multi-task learning model for coronary artery disease diagnosis and severity grading.Front Bioeng Biotechnol. 2025 Jul 25;13:1556748. doi: 10.3389/fbioe.2025.1556748. eCollection 2025. Front Bioeng Biotechnol. 2025. PMID: 40787200 Free PMC article.
-
Merlin: A Vision Language Foundation Model for 3D Computed Tomography.Res Sq [Preprint]. 2024 Jun 28:rs.3.rs-4546309. doi: 10.21203/rs.3.rs-4546309/v1. Res Sq. 2024. PMID: 38978576 Free PMC article. Preprint.
References
-
- Heidenreich PA, et al. 2022 AHA/ACC/HFSA guideline for the management of heart failure: executive summary: a report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation. 2022;145:e876–e894. - PubMed
-
- Al-Khatib SM, et al. 2017 AHA/ACC/HRS guideline for management of patients with ventricular arrhythmias and the prevention of sudden cardiac death: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines and the Heart Rhythm Society. Circulation. 2018;138:e210–e271. - PubMed