Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 26;5(1):372.
doi: 10.1038/s43856-025-01077-1.

Subgrouping patients with ischemic heart disease by means of the Markov cluster algorithm

Affiliations

Subgrouping patients with ischemic heart disease by means of the Markov cluster algorithm

Amalie D Haue et al. Commun Med (Lond). .

Abstract

Background: Ischemic heart disease (IHD) is heterogeneous with respect to onset, burden of symptoms, and disease progression. We hypothesized that unsupervised clustering analysis could facilitate identification of distinct and clinically relevant multimorbidity clusters.

Methods: We included IHD patients who underwent coronary angiography (CAG) or coronary computed tomography angiography (CCTA) between 2004 and 2016 and used the earliest procedure as the index date. Patient health records were obtained from the Danish National Patient Registry, the Danish National Prescription Registry, and two in-hospital laboratory database systems. Genetic data were obtained from the Copenhagen Hospital Biobank. Using registered pre-index diagnosis codes (n = 3046), patients were clustered by application of the Markov Cluster algorithm. Multimorbidity clusters were then characterized using Cox regressions (new ischemic events, non-IHD mortality, and all-cause mortality) and enrichment analysis to explore both risks and phenotypical characteristics.

Results: In a cohort of 72,249 patients with IHD (mean age 63.9 years, 63.1% males), 31 distinct clusters (C1-31, 67,136 patients) are identified. Comparing each cluster to the 30 others, seven clusters (9,590 patients) have significantly higher or lower risk of new ischemic events (five and two clusters, respectively). A total of 18 clusters (35,982 patients) have higher or lower risk of death from non-IHD causes (12 and six clusters, respectively), and 23 clusters have a statistically significant higher or lower risk for all-cause mortality. Cardiovascular or inflammatory diseases are commonly enriched in clusters (13). Distributions for 24 laboratory test results differ significantly across clusters. Polygenic risk scores are increased in a total of 15 clusters (48.4%).

Conclusions: Based on prior disease profiles, unsupervised clustering robustly stratify patients with IHD in subgroups with similar clinical features and outcomes.

Plain language summary

Ischemic heart disease (IHD) is among the leading causes of death world-wide. A major challenge is that the disease is highly heterogeneous and covers a wide range of different presentation forms, progression patterns and treatment responses. Despite this fact, patients diagnosed with IHD are commonly treated as one. In this study we sought to analyze patients diagnosed with IHD by identification of more homogenous subgroups. By describing patients with IHD with respect to all other pre-existing diseases they were diagnosed with, we identified subgroups that had different risk profiles and were characterized by different patterns when looking at their blood tests and genetic profiles.

PubMed Disclaimer

Conflict of interest statement

Competing interests: S.B. received personal compensation for managing board membership at Intomics and Proscion and is a scientific advisory board member of Biocenter Finland, Health Data Research UK, the Finnish Center of Excellence in Complex Disease Genetics, ELIXIR Node (Luxembourg), Lund University Diabetes Centre (Lund, Sweden), and SciLifeLab (Stockholm, Sweden). S.B. reports stocks in Intomics, Hoba Therapeutics Aps, Novo Nordisk, Elly Lilli & Co., and Lundbeck. H.B. has received lecture fees from Amgen and Bristol-Myers Squibb. L.K. is a member of the speaker steering committee for AstraZeneca and Novartis. L.K. has received lecture fees from Novo Nordisk and Boehringer Ingelheim. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Graphical overview of the study.
Steps AC: Construction of patient-specific disease-frequency vectors by assembling the ICD-10 codes registered in the electronic health records. Using the date of the first CAG/CCTA as the index date, only ICD-10 codes registered before the index date are included. Steps D and E: Patient-specific vectors are embedded using SVD, and a patient similarity matrix is constructed from the cosine of the angle between the embedded vectors. Step F: Application of the MCL algorithm to obtain clusters of patients with specific patterns of multimorbidity. Step G: Characterization of the resulting clusters to examine their risk of three pre-defined outcomes and phenotypic characteristics defined from laboratory, medication and genetic data. CAG coronary angiography, CCTA coronary computed tomography angiography, ICD-10 International Statistical Classification of Diseases and Related Health Problems 10th Revision, IHD ischemic heart disease, MCL Markov Cluster, SVD singular value decomposition.
Fig. 2
Fig. 2. Flowchart: data sources and study population.
NPR The Danish National Patient Registry. IHD ischemic heart disease (ICD-10 codes I20–I25). CAG coronary arteriography, CCTA coronary computed tomography angiography, ICD-10 International Statistical Classification of Diseases and Related Health Problems 10th Revision.
Fig. 3
Fig. 3. Overview of quantitative characterization of multimorbidity clusters.
Each box in all panels represents a cluster (n = 31). In all panels, boxes are ranked by the number of patients in a specific cluster (decreasing going right and down). Yellow and green panels display cluster size (panel A) and  mean age at index (panel B). Panel C displays the distribution of patient sex in each cluster. C Panels DF display the dependent variables in the survival analyses. Panel D: New ischemic events. Panel E: Non-IHD mortality. Panel F: All-cause mortality. Asterisks mark statistical significance with adjusted P-value < 0.05. For exact P-values, see Table 2.
Fig. 4
Fig. 4. Graphical summary of study results.
Boxes represent clusters, and clusters are grouped according to similarity with respect to results from the survival analysis. Boxes located bottom right of all groups represent the results from the survival analysis. Orange boxes: Risk of secondary ischemic events. Green boxes: Risk of death from non-IHD causes. Symbols to the left of the boxes that represent clusters indicate characteristic findings from integrated analysis of medication data, laboratory test results, and genetic data, respectively. IHD ischemic heart disease.

References

    1. Antman, E. M. & Braunwald, E. Managing stable ischemic heart disease. N. Engl. J. Med.382, 1468–1470 (2020). - PubMed
    1. Ferraro, R. et al. Evaluation and management of patients with stable angina: beyond the ischemia paradigm. J. Am. Coll. Cardiol.76, 2252–2266 (2020). - PubMed
    1. Nabel, E. G. & Braunwald, E. A tale of coronary artery disease and myocardial infarction. N. Engl. J. Med.366, 54–63 (2012). - PubMed
    1. Forman, D. E. et al. Multimorbidity in older adults with cardiovascular disease. J. Am. Coll. Cardiol.71, 2149–2161 (2018). - PMC - PubMed
    1. Afilalo, J. et al. Frailty assessment in the cardiovascular care of older adults. J. Am. Coll. Cardiol.63, 747–762 (2014). - PMC - PubMed

LinkOut - more resources