Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 10;83(1):100.
doi: 10.1186/s13690-025-01589-1.

Discovering sequential patterns and interrelations among multiple diseases in electronic medical records using cSPADE algorithm

Affiliations

Discovering sequential patterns and interrelations among multiple diseases in electronic medical records using cSPADE algorithm

He Ma et al. Arch Public Health. .

Abstract

Background: The intricate relationships between diseases are characterized by the sequence and temporal intervals of their onset, which are critical for understanding the essence of comorbidity and predicting disease progression. This study seeks to investigate the interdependencies and chronological order of various diseases that occur in the same patient by employing sequential pattern mining algorithms. Specifically, the research endeavors to delineate the disparities in the time intervals between the onset of distinct disorders and to scrutinize the concordance and discordance in disease sequence patterns across gender groups.

Methods: Patient identity information, visit dates, and diagnostic data were aggregated from the electronic medical record databases of three large general hospitals. The diagnostic information included the International Classification of Diseases, Tenth Revision (ICD-10) codes, along with their corresponding descriptions. A total of 1,060,344 diagnostic entries from 269,973 patients who visited during 2012-2022 were incorporated into the mining model, which was constructed using the Sequential Pattern Discovery using Equivalence Classes (SPADE) algorithm.

Results: A total of 212 highly supported sequential pattern rules were ultimately identified, most of which were related to disorders of the endocrine and circulatory systems. In 66 patterns, the order of disease incidence or diagnosis was relatively well-defined. The time interval between the onset of two diseases ranged from 1 to 2 years in most patterns. For patterns with short-term relationships, the interval was less than 2 months, whereas in some cases, the interval extended to 5 to 10 years. Among the extracted patterns, 176 exhibited stronger support in the male dataset compared to the female dataset. Patterns related to cardiovascular and liver diseases were more prevalent in males, while those associated with orthopedic and endocrine disorders showed higher prevalence in females.

Conclusion: Our findings demonstrate the effectiveness of the constrained SPADE (cSPADE) algorithm in comorbidity research and highlight several clinically significant sequential comorbidity patterns. These patterns are expected to contribute to disease prevention, etiological research, and the development of clinical decision support systems.

Keywords: Comorbidity; Sequential pattern mining; cSPADE.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Since the study did not involve human experimentation and all the information was anonymized, it was exempt from approval by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart illustrating the data mining process of the interrelations among multiple diseases
Fig. 2
Fig. 2
Age distribution of patients included in the literature
Fig. 3
Fig. 3
Temporal interval distribution between first and last hospital consultations of patients in the study
Fig. 4
Fig. 4
Distribution of diagnostic codes for patients in the study (classified by ICD-10 “chapters”)
Fig. 5
Fig. 5
Distribution of top 30 diagnostic codes for patients in the study (classified by ICD-10 “categories”)

Similar articles

References

    1. Slivnick J, Lampert BC. Hypertension and heart failure. Heart Fail Clin. 2019;15(4):531–41. - PubMed
    1. Georgianos PI, Agarwal R. Hypertension in chronic kidney disease-treatment standard 2023. Nephrol Dial Transpl. 2023;38(12):2694–703. - PMC - PubMed
    1. Kato H, Mitani Y, Goda T, Yamaue H. Concomitant gallbladder agenesis with methimazole embryopathy. Am J Case Rep. 2020;21:e926310. - PMC - PubMed
    1. Shojaeifard M, Saedi S, Alizadeh Ghavidel A, Karimlu MR, Kasaei M, Reza Pouraliakbar H, et al. Concomitant cardiac and hepatic hemangiomas. Echocardiography. 2020;37(3):462–4. - PubMed
    1. Sunwoo BY, Raphelson JR, Malhotra A. Chronic obstructive pulmonary disease and obstructive sleep apnea overlap: who to treat and how? Expert Rev Respir Med. 2024;18(7):527–37. - PMC - PubMed

LinkOut - more resources