Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Oct:122:103889.
doi: 10.1016/j.jbi.2021.103889. Epub 2021 Aug 16.

Longitudinal K-means approaches to clustering and analyzing EHR opioid use trajectories for clinical subtypes

Affiliations
Review

Longitudinal K-means approaches to clustering and analyzing EHR opioid use trajectories for clinical subtypes

Sarah Mullin et al. J Biomed Inform. 2021 Oct.

Abstract

Identification of patient subtypes from retrospective Electronic Health Record (EHR) data is fraught with inherent modeling issues, such as missing data and variable length time intervals, and the results obtained are highly dependent on data pre-processing strategies. As we move towards personalized medicine, assessing accurate patient subtypes will be a key factor in creating patient specific treatment plans. Partitioning longitudinal trajectories from irregularly spaced and variable length time intervals is a well-established, but open problem. In this work, we present and compare k-means approaches for subtyping opioid use trajectories from EHR data. We then interpret the resulting subtypes using decision trees, examining how each subtype is influenced by opioid medication features and patient diagnoses, procedures, and demographics. Finally, we discuss how the subtypes can be incorporated in static machine learning models as features in predicting opioid overdose and adverse events. The proposed methods are general, and can be extended to other EHR prescription dosage trajectories.

Keywords: Electronic health records; Longitudinal k-means clustering; Opioids; Patient subtypes; Trajectory analysis.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Raw Patient Morphine Milligram Equivalent (MME) Trajectories for 50 Randomly Sampled Patients
Figure 2.
Figure 2.
Selecting optimal k using cross-validation and method suggested internal criterion. The plots show the mean criterion value with error bars across the five folds. In (a), the KML method shows all criterion (Calinski-Harabasz, Davies-Bouldin, and Ray and Turi) maximized for k=3 clusters. For B-spline (b), the BIC is minimized for k=7 clusters and AIC plateaus at k=7 as well. Finally, (c) shows the silhouette score is highest for k=7 clusters in VRAE.
Figure 3.
Figure 3.
Profile analysis of 5-fold cross validation for clusters selected by method criterion. Other than for the highest, smallest, and most erratic clusters (represented by the top curves in all plots), the profile analysis shows stable clusters across all folds for each method.
Figure 4.
Figure 4.
Profile analysis and trajectories found using the three k-means methods on the full 70% training set (n=2,846) compared to the 30% testing set (n=1,151).
Figure 5.
Figure 5.
Decision tree analysis for KML and B-spline extracted k-means clusters
Figure 6.
Figure 6.
Proportion of cases for clinical features by cluster. For kml (a), since the majority of patients have clustered to cluster 1 (89.6%), they also make up a large portion of the clinical features. Cluster 2 has a high proportion of opioid treatment patients. For B-spline (b), clusters 3 and 5 contain a large portion of patients on buprenorphine. For VRAE (c), the majority of buprenorphine patients come from clusters 6 and 7.

References

    1. Botsis T, Hartvigsen G, Chen F, Weng C. Secondary use of EHR: data quality issues and informatics opportunities. Summit on Translational Bioinformatics. 2010;2010:1. - PMC - PubMed
    1. Van Calster B, Wynants L. Machine Learning in Medicine. New England Journal Of Medicine. 2019;380(26):2588-. - PubMed
    1. Aghabozorgi S, Shirkhorshidi AS, Wah TY. Time-series clustering–a decade review. Information Systems. 2015;53:16–38.
    1. de Jong J, Emon MA, Wu P, Karki R, Sood M, Godard P, et al. Deep learning for clustering of multivariate clinical patient trajectories with missing values. GigaScience. 2019;8(11). - PMC - PubMed
    1. Schulam P, Arora R, editors. Disease trajectory maps. Advances in neural information processing systems; 2016.

Publication types

Substances