Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 21;401(10372):215-225.
doi: 10.1016/S0140-6736(22)02079-7. Epub 2022 Dec 20.

Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts

Affiliations

Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts

Iain S Forrest et al. Lancet. .

Abstract

Background: Binary diagnosis of coronary artery disease does not preserve the complexity of disease or quantify its severity or its associated risk with death; hence, a quantitative marker of coronary artery disease is warranted. We evaluated a quantitative marker of coronary artery disease derived from probabilities of a machine learning model.

Methods: In this cohort study, we developed and validated a coronary artery disease-predictive machine learning model using 95 935 electronic health records and assessed its probabilities as in-silico scores for coronary artery disease (ISCAD; range 0 [lowest probability] to 1 [highest probability]) in participants in two longitudinal biobank cohorts. We measured the association of ISCAD with clinical outcomes-namely, coronary artery stenosis, obstructive coronary artery disease, multivessel coronary artery disease, all-cause death, and coronary artery disease sequelae.

Findings: Among 95 935 participants, 35 749 were from the BioMe Biobank (median age 61 years [IQR 18]; 14 599 [41%] were male and 21 150 [59%] were female; 5130 [14%] were with diagnosed coronary artery disease) and 60 186 were from the UK Biobank (median age 62 [15] years; 25 031 [42%] male and 35 155 [58%] female; 8128 [14%] with diagnosed coronary artery disease). The model predicted coronary artery disease with an area under the receiver operating characteristic curve of 0·95 (95% CI 0·94-0·95; sensitivity of 0·94 [0·94-0·95] and specificity of 0·82 [0·81-0·83]) and 0·93 (0·92-0·93; sensitivity of 0·90 [0·89-0·90] and specificity of 0·88 [0·87-0·88]) in the BioMe validation and holdout sets, respectively, and 0·91 (0·91-0·91; sensitivity of 0·84 [0·83-0·84] and specificity of 0·83 [0·82-0·83]) in the UK Biobank external test set. ISCAD captured coronary artery disease risk from known risk factors, pooled cohort equations, and polygenic risk scores. Coronary artery stenosis increased quantitatively with ascending ISCAD quartiles (increase per quartile of 12 percentage points), including risk of obstructive coronary artery disease, multivessel coronary artery disease, and stenosis of major coronary arteries. Hazard ratios (HRs) and prevalence of all-cause death increased stepwise over ISCAD deciles (decile 1: HR 1·0 [95% CI 1·0-1·0], 0·2% prevalence; decile 6: 11 [3·9-31], 3·1% prevalence; and decile 10: 56 [20-158], 11% prevalence). A similar trend was observed for recurrent myocardial infarction. 12 (46%) undiagnosed individuals with high ISCAD (≥0·9) had clinical evidence of coronary artery disease according to the 2014 American College of Cardiology/American Heart Association Task Force guidelines.

Interpretation: Electronic health record-based machine learning was used to generate an in-silico marker for coronary artery disease that can non-invasively quantify atherosclerosis and risk of death on a continuous spectrum, and identify underdiagnosed individuals.

Funding: National Institutes of Health.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests RD reported receiving grants from AstraZeneca; grants and non-financial support from Goldfinch Bio; being a scientific co-founder, consultant, and equity holder for Pensieve Health; and being a consultant for Variant Bio, outside of the submitted work. GNN reported being a scientific co-founder, consultant, advisory board member, and equity owner of Renalytix AI; a scientific co-founder and equity holder for Pensieve Health; a consultant for Variant Bio; and received grants from Goldfinch Bio and personal fees from Renalytix AI, BioVie, Reata, AstraZeneca, and GLG Consulting, outside of the submitted work. All other authors declare no competing interests.

Figures

Fig. 1.
Fig. 1.
Performance of the machine learning model for the detection of coronary artery disease (CAD) in the validation, holdout, and external test sets. The machine learning model was trained/validated in the BioMe Biobank (BioMe 1), assessed in a holdout set in BioMe (BioMe 2), and externally tested in the UK Biobank. a, Electronic health records (EHRs) of study participants contained both categorical data (i.e., diagnosis codes and medications) and continuous data (i.e., laboratory readings and vital measurements). Only EHR data prior to the earliest date of coronary artery disease (CAD) diagnosis, procedure (e.g., angioplasty), or medication (e.g., statins) prescription were used for CAD cases. In UK Biobank, date of statins prescription is unavailable and individuals with statins were excluded; controls with an Elixhauser comorbidity index of zero were retained. Participants with >70% missing data in the EHR were removed, and the EHR data of the remaining individuals underwent imputation with a random forest-based algorithm. We restricted to participants at least 40 years of age as the target population for which CAD is prevalent and the pooled cohort equations (PCE) is designed to guide statin initiation. Age was defined by the last considered clinical feature entry. Participants with at least one year of EHR data and three recorded clinical encounters were retained. b, The machine learning model discriminated CAD controls from cases with area under the receiver-operating-characteristic curves (AUROCs) of 0.95 (95% CI, 0.94–0.95), 0.93 (95% CI, 0.92–0.93), and 0.91 (95% CI, 0.91–0.91) for the validation, holdout, and external test datasets, respectively.
Fig. 2.
Fig. 2.
Relationship of in silico score for CAD (ISCAD) with coronary stenosis and atherosclerosis complexity on cardiac catheterization. Cardiac catheterization data were examined for association with ISCAD. This comprised percent coronary stenosis, recorded as 7 strata ranging from [0, 30), less than 30%, to [100], 100%, and SYNTAX score ranging from 0, low complexity, to 30, high complexity. ISCAD were stratified by quartiles. a, Individuals who underwent cardiac catheterization (red) had higher mean ISCAD (dashed line) than those who had not underwent cardiac catheterization (purple). b, Violin plots show the distribution of samples across coronary stenosis values along with the mean value overlaid as a point for each ISCAD quartile. c, Violin plots show the distribution of samples across SYNTAX score values along with the mean value overlaid as a point for each ISCAD quartile. d, Schematic of coronary arteries depicts the association of ISCAD with obstructive CAD (≥50% stenosis in the left main coronary artery, ≥70% stenosis in any other coronary artery, or both), multivessel CAD (≥70% stenosis in at least two coronary arteries, or ≥50% stenosis in left main coronary artery and ≥70% stenosis in another coronary artery), left main stenosis (≥50%), proximal left anterior descending (LAD) stenosis (≥70%), left circumflex stenosis (≥70%), and right coronary artery stenosis (≥70%). Results are reported as adjusted odds ratio (95% CI) P value per increase in ISCAD quartile.
Fig. 3.
Fig. 3.
All-cause mortality stratified by in silico score for CAD (ISCAD). All-cause mortality was stratified by ISCAD deciles and adjusted hazard ratios (HR) were compared to the lowest decile. a, Percent mortality and adjusted HR for mortality increased monotonically over ascending ISCAD deciles. b, Kaplan-Meier survival curves relate age in increments of one year on the X-axis to cumulative survival for each age on the Y-axis and differed by ISCAD decile. Higher ISCAD deciles had lower survival over increasing ages compared to lower ISCAD deciles.

Comment in

References

    1. Smith SC, Benjamin EJ, Bonow RO, et al. AHA/ACCF secondary prevention and risk reduction therapy for patients with coronary and other atherosclerotic vascular disease: 2011 update: A guideline from the American Heart Association and American College of Cardiology Foundation. Circulation 2011; 124: 2458–73. - PubMed
    1. Sidney C. Smith J, Allen J, Blair SN, et al. AHA/ACC Guidelines for Secondary Prevention for Patients With Coronary and Other Atherosclerotic Vascular Disease: 2006 Update. Circulation 2006; 16: 60–2.
    1. Knuuti J, Wijns W, Saraste A, et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromesThe Task Force for the diagnosis and management of chronic coronary syndromes of the European Society of Cardiology (ESC). Eur Heart J 2020; 41: 407–77. - PubMed
    1. Kitsios GD, Dahabreh IJ, Trikalinos TA, Schmid CH, Huggins GS, Kent DM. Heterogeneity of the Phenotypic Definition of Coronary Artery Disease and Its Impact on Genetic Association Studies. Circ Cardiovasc Genet 2011; 4: 58–67. - PMC - PubMed
    1. Fox KAA, Metra M, Morais J, Atar D. The myth of ‘stable’ coronary artery disease. Nat Rev Cardiol 2020; 17: 9–21. - PubMed