Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec 7;10(23):e021976.
doi: 10.1161/JAHA.121.021976. Epub 2021 Nov 30.

Unsupervised Learning for Automated Detection of Coronary Artery Disease Subgroups

Affiliations

Unsupervised Learning for Automated Detection of Coronary Artery Disease Subgroups

Alyssa M Flores et al. J Am Heart Assoc. .

Abstract

Background The promise of precision population health includes the ability to use robust patient data to tailor prevention and care to specific groups. Advanced analytics may allow for automated detection of clinically informative subgroups that account for clinical, genetic, and environmental variability. This study sought to evaluate whether unsupervised machine learning approaches could interpret heterogeneous and missing clinical data to discover clinically important coronary artery disease subgroups. Methods and Results The Genetic Determinants of Peripheral Arterial Disease study is a prospective cohort that includes individuals with newly diagnosed and/or symptomatic coronary artery disease. We applied generalized low rank modeling and K-means cluster analysis using 155 phenotypic and genetic variables from 1329 participants. Cox proportional hazard models were used to examine associations between clusters and major adverse cardiovascular and cerebrovascular events and all-cause mortality. We then compared performance of risk stratification based on clusters and the American College of Cardiology/American Heart Association pooled cohort equations. Unsupervised analysis identified 4 phenotypically and prognostically distinct clusters. All-cause mortality was highest in cluster 1 (oldest/most comorbid; 26%), whereas major adverse cardiovascular and cerebrovascular event rates were highest in cluster 2 (youngest/multiethnic; 41%). Cluster 4 (middle-aged/healthiest behaviors) experienced more incident major adverse cardiovascular and cerebrovascular events (30%) than cluster 3 (middle-aged/lowest medication adherence; 23%), despite apparently similar risk factor and lifestyle profiles. In comparison with the pooled cohort equations, cluster membership was more informative for risk assessment of myocardial infarction, stroke, and mortality. Conclusions Unsupervised clustering identified 4 unique coronary artery disease subgroups with distinct clinical trajectories. Flexible unsupervised machine learning algorithms offer the ability to meaningfully process heterogeneous patient data and provide sharper insights into disease characterization and risk assessment. Registration URL: https://www.clinicaltrials.gov; Unique identifier: NCT00380185.

Keywords: cluster analysis; coronary artery disease; machine learning; phenotype discovery.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Schematic for generalized low rank modeling.
A, Patient data are condensed to fewer dimensions to allow for analysis using unsupervised K‐means clustering. The “features” matrix is a high‐dimensional data set that includes patient information on demographics and clinical, lifestyle, angiographic, and cardiovascular genetic risk markers. This data set is transformed into a lower dimensional “latent feature” space by approximating the features matrix as the product of 2 matrices, shown as the X (containing each observation) and Y representations (containing the definition for each observation). L,r indicates the loss function that accounts for the accuracy in the data approximation and regularizes the latent feature representation to prevent overfitting. B, After cluster analysis, data are then transformed back to their original form and analyzed to discover subgroup characteristics and compare long‐term outcomes across clusters.
Figure 2
Figure 2. Distinct subgroups of patients with coronary artery disease identified by unsupervised clustering.
Plot showing 4 distinct groups of patients identified by K‐means clustering. Data are plotted based on the top 20 principal components across the first 2 discriminant functions to form a 2‐dimensional plot.
Figure 3
Figure 3. Schematic representation of the 4 CAD clusters and their major features.
ABI indicates ankle‐brachial index; BMI, body mass index; CAD, coronary artery disease; CHF, congestive heart failure; CVA, cerebrovascular accident; LDL, low‐density lipoprotein; MACCE, major adverse cardiovascular and cerebrovascular events; MI, myocardial infarction; and PAD, peripheral artery disease.
Figure 4
Figure 4. Long‐term outcomes of the 4 coronary artery disease clusters.
Kaplan–Meier curves showing (A) MACCE* and (B) all‐cause mortality. *Primary MACCE composite included myocardial infarction, stroke, coronary revascularization, and peripheral revascularization. MACCE indicates major adverse cardiovascular and cerebrovascular events.
Figure 5
Figure 5. Comparison of clustering to PCE risk groups for prediction of MACCE* and all‐cause mortality.
*PCE‐consistent MACCE included myocardial infarction, stroke, and death. MACCE indicates major adverse cardiovascular and cerebrovascular events; and PCE, pooled cohort equations.

Similar articles

Cited by

References

    1. Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J III. Factors of risk in the development of coronary heart disease–six year follow‐up experience. The Framingham Study. Ann Intern Med. 1961;55:33–50. doi: 10.7326/0003-4819-55-1-33 - DOI - PubMed
    1. Goff DC, Lloyd‐Jones DM, Bennett G, Coady S, D’Agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O’Donnell CJ, et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association task force on practice guidelines. Circulation. 2014;129:S49–S73. doi: 10.1161/01.cir.0000437741.48606.98 - DOI - PubMed
    1. Stoekenbroek RM, Boekholdt SM, Luben R, Hovingh GK, Zwinderman AH, Wareham NJ, Khaw KT, Peters RJ. Heterogeneous impact of classic atherosclerotic risk factors on different arterial territories: the EPIC‐Norfolk prospective population study. Eur Heart J. 2016;37:880–889. doi: 10.1093/eurheartj/ehv630 - DOI - PubMed
    1. Price JF, Mowbray PI, Lee AJ, Rumley A, Lowe GD, Fowkes FG. Relationship between smoking and cardiovascular risk factors in the development of peripheral arterial disease and coronary artery disease: Edinburgh artery study. Eur Heart J. 1999;20:344–353. - PubMed
    1. Ding N, Sang Y, Chen J, Ballew SH, Kalbaugh CA, Salameh MJ, Blaha MJ, Allison M, Heiss G, Selvin E, et al. Cigarette smoking, smoking cessation, and long‐term risk of 3 major atherosclerotic diseases. J Am Coll Cardiol. 2019;74:498–507. - PMC - PubMed

Publication types

Associated data