Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts
- PMID: 36563696
- PMCID: PMC10069625
- DOI: 10.1016/S0140-6736(22)02079-7
Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts
Abstract
Background: Binary diagnosis of coronary artery disease does not preserve the complexity of disease or quantify its severity or its associated risk with death; hence, a quantitative marker of coronary artery disease is warranted. We evaluated a quantitative marker of coronary artery disease derived from probabilities of a machine learning model.
Methods: In this cohort study, we developed and validated a coronary artery disease-predictive machine learning model using 95 935 electronic health records and assessed its probabilities as in-silico scores for coronary artery disease (ISCAD; range 0 [lowest probability] to 1 [highest probability]) in participants in two longitudinal biobank cohorts. We measured the association of ISCAD with clinical outcomes-namely, coronary artery stenosis, obstructive coronary artery disease, multivessel coronary artery disease, all-cause death, and coronary artery disease sequelae.
Findings: Among 95 935 participants, 35 749 were from the BioMe Biobank (median age 61 years [IQR 18]; 14 599 [41%] were male and 21 150 [59%] were female; 5130 [14%] were with diagnosed coronary artery disease) and 60 186 were from the UK Biobank (median age 62 [15] years; 25 031 [42%] male and 35 155 [58%] female; 8128 [14%] with diagnosed coronary artery disease). The model predicted coronary artery disease with an area under the receiver operating characteristic curve of 0·95 (95% CI 0·94-0·95; sensitivity of 0·94 [0·94-0·95] and specificity of 0·82 [0·81-0·83]) and 0·93 (0·92-0·93; sensitivity of 0·90 [0·89-0·90] and specificity of 0·88 [0·87-0·88]) in the BioMe validation and holdout sets, respectively, and 0·91 (0·91-0·91; sensitivity of 0·84 [0·83-0·84] and specificity of 0·83 [0·82-0·83]) in the UK Biobank external test set. ISCAD captured coronary artery disease risk from known risk factors, pooled cohort equations, and polygenic risk scores. Coronary artery stenosis increased quantitatively with ascending ISCAD quartiles (increase per quartile of 12 percentage points), including risk of obstructive coronary artery disease, multivessel coronary artery disease, and stenosis of major coronary arteries. Hazard ratios (HRs) and prevalence of all-cause death increased stepwise over ISCAD deciles (decile 1: HR 1·0 [95% CI 1·0-1·0], 0·2% prevalence; decile 6: 11 [3·9-31], 3·1% prevalence; and decile 10: 56 [20-158], 11% prevalence). A similar trend was observed for recurrent myocardial infarction. 12 (46%) undiagnosed individuals with high ISCAD (≥0·9) had clinical evidence of coronary artery disease according to the 2014 American College of Cardiology/American Heart Association Task Force guidelines.
Interpretation: Electronic health record-based machine learning was used to generate an in-silico marker for coronary artery disease that can non-invasively quantify atherosclerosis and risk of death on a continuous spectrum, and identify underdiagnosed individuals.
Funding: National Institutes of Health.
Copyright © 2023 Elsevier Ltd. All rights reserved.
Conflict of interest statement
Declaration of interests RD reported receiving grants from AstraZeneca; grants and non-financial support from Goldfinch Bio; being a scientific co-founder, consultant, and equity holder for Pensieve Health; and being a consultant for Variant Bio, outside of the submitted work. GNN reported being a scientific co-founder, consultant, advisory board member, and equity owner of Renalytix AI; a scientific co-founder and equity holder for Pensieve Health; a consultant for Variant Bio; and received grants from Goldfinch Bio and personal fees from Renalytix AI, BioVie, Reata, AstraZeneca, and GLG Consulting, outside of the submitted work. All other authors declare no competing interests.
Figures
Comment in
-
A machine-learning-derived, in silico marker for CAD identifies underdiagnosed patients.Nat Rev Cardiol. 2023 Mar;20(3):139. doi: 10.1038/s41569-023-00833-x. Nat Rev Cardiol. 2023. PMID: 36650288 No abstract available.
-
Machine learning-based markers for CAD.Lancet. 2023 Jul 15;402(10397):182. doi: 10.1016/S0140-6736(23)01060-7. Lancet. 2023. PMID: 37453747 No abstract available.
-
Machine learning-based markers for CAD.Lancet. 2023 Jul 15;402(10397):182-183. doi: 10.1016/S0140-6736(23)01061-9. Lancet. 2023. PMID: 37453748 Free PMC article. No abstract available.
-
Machine learning-based markers for CAD.Lancet. 2023 Jul 15;402(10397):183. doi: 10.1016/S0140-6736(23)01062-0. Lancet. 2023. PMID: 37453749 No abstract available.
References
-
- Smith SC, Benjamin EJ, Bonow RO, et al. AHA/ACCF secondary prevention and risk reduction therapy for patients with coronary and other atherosclerotic vascular disease: 2011 update: A guideline from the American Heart Association and American College of Cardiology Foundation. Circulation 2011; 124: 2458–73. - PubMed
-
- Sidney C. Smith J, Allen J, Blair SN, et al. AHA/ACC Guidelines for Secondary Prevention for Patients With Coronary and Other Atherosclerotic Vascular Disease: 2006 Update. Circulation 2006; 16: 60–2.
-
- Knuuti J, Wijns W, Saraste A, et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromesThe Task Force for the diagnosis and management of chronic coronary syndromes of the European Society of Cardiology (ESC). Eur Heart J 2020; 41: 407–77. - PubMed
-
- Fox KAA, Metra M, Morais J, Atar D. The myth of ‘stable’ coronary artery disease. Nat Rev Cardiol 2020; 17: 9–21. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
