Discovering sequential patterns and interrelations among multiple diseases in electronic medical records using cSPADE algorithm
- PMID: 40211318
- PMCID: PMC11983760
- DOI: 10.1186/s13690-025-01589-1
Discovering sequential patterns and interrelations among multiple diseases in electronic medical records using cSPADE algorithm
Abstract
Background: The intricate relationships between diseases are characterized by the sequence and temporal intervals of their onset, which are critical for understanding the essence of comorbidity and predicting disease progression. This study seeks to investigate the interdependencies and chronological order of various diseases that occur in the same patient by employing sequential pattern mining algorithms. Specifically, the research endeavors to delineate the disparities in the time intervals between the onset of distinct disorders and to scrutinize the concordance and discordance in disease sequence patterns across gender groups.
Methods: Patient identity information, visit dates, and diagnostic data were aggregated from the electronic medical record databases of three large general hospitals. The diagnostic information included the International Classification of Diseases, Tenth Revision (ICD-10) codes, along with their corresponding descriptions. A total of 1,060,344 diagnostic entries from 269,973 patients who visited during 2012-2022 were incorporated into the mining model, which was constructed using the Sequential Pattern Discovery using Equivalence Classes (SPADE) algorithm.
Results: A total of 212 highly supported sequential pattern rules were ultimately identified, most of which were related to disorders of the endocrine and circulatory systems. In 66 patterns, the order of disease incidence or diagnosis was relatively well-defined. The time interval between the onset of two diseases ranged from 1 to 2 years in most patterns. For patterns with short-term relationships, the interval was less than 2 months, whereas in some cases, the interval extended to 5 to 10 years. Among the extracted patterns, 176 exhibited stronger support in the male dataset compared to the female dataset. Patterns related to cardiovascular and liver diseases were more prevalent in males, while those associated with orthopedic and endocrine disorders showed higher prevalence in females.
Conclusion: Our findings demonstrate the effectiveness of the constrained SPADE (cSPADE) algorithm in comorbidity research and highlight several clinically significant sequential comorbidity patterns. These patterns are expected to contribute to disease prevention, etiological research, and the development of clinical decision support systems.
Keywords: Comorbidity; Sequential pattern mining; cSPADE.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethics approval and consent to participate: Since the study did not involve human experimentation and all the information was anonymized, it was exempt from approval by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.
Figures





Similar articles
-
Temporal condition pattern mining in large, sparse electronic health record data: A case study in characterizing pediatric asthma.J Am Med Inform Assoc. 2020 Apr 1;27(4):558-566. doi: 10.1093/jamia/ocaa005. J Am Med Inform Assoc. 2020. PMID: 32049282 Free PMC article.
-
Data-driven treatment pathways mining for early breast cancer using cSPADE algorithm and system clustering.Int J Health Plann Manage. 2022 Sep;37(5):2569-2584. doi: 10.1002/hpm.3483. Epub 2022 Apr 20. Int J Health Plann Manage. 2022. PMID: 35445441
-
Mining co-occurrence and sequence patterns from cancer diagnoses in New York State.PLoS One. 2018 Apr 26;13(4):e0194407. doi: 10.1371/journal.pone.0194407. eCollection 2018. PLoS One. 2018. PMID: 29698405 Free PMC article.
-
Temporal characterization of Alzheimer's Disease with sequences of clinical records.EBioMedicine. 2023 Jun;92:104629. doi: 10.1016/j.ebiom.2023.104629. Epub 2023 May 27. EBioMedicine. 2023. PMID: 37247495 Free PMC article. Review.
-
Trends in symptom prevalence and sequential onset of SARS-CoV-2 infection from 2020 to 2022 in East and Southeast Asia: a trajectory pattern exploration based on summary data.Arch Public Health. 2024 Aug 15;82(1):125. doi: 10.1186/s13690-024-01357-7. Arch Public Health. 2024. PMID: 39148103 Free PMC article.
References
Grants and funding
LinkOut - more resources
Full Text Sources