Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022:34:101104.
doi: 10.1016/j.imu.2022.101104. Epub 2022 Oct 6.

Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database

Affiliations

Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database

Autumn M McKnite et al. Inform Med Unlocked. 2022.

Abstract

Electronic health records (EHRs) have given rise to large and complex databases of medical information that have the potential to become powerful tools for clinical research. However, differences in coding systems and the detail and accuracy of the information within EHRs can vary across institutions. This makes it challenging to identify subpopulations of patients and limits the widespread use of multi-institutional databases. In this study, we leveraged machine learning to identify patterns in medication usage among hospitalized pediatric patients receiving renal replacement therapy and created a predictive model that successfully differentiated between intermittent (iHD) and continuous renal replacement therapy (CRRT) hemodialysis patients. We trained six machine learning algorithms (logistical regression, Naïve Bayes, k-nearest neighbor, support vector machine, random forest, and gradient boosted trees) using patient records from a multi-center database (n = 533) and prescribed medication ingredients (n = 228) as features to discriminate between the two hemodialysis types. Predictive skill was assessed using a 5-fold cross-validation, and the algorithms showed a range of performance from 0.7 balanced accuracy (logistical regression) to 0.86 (random forest). The two best performing models were further tested using an independent single-center dataset and achieved 84-87% balanced accuracy. This model overcomes issues inherent within large databases and will allow us to utilize and combine historical records, significantly increasing population size and diversity within both iHD and CRRT populations for future clinical studies. Our work demonstrates the utility of using medications alone to accurately differentiate subpopulations of patients in large datasets, allowing codes to be transferred between different coding systems. This framework has the potential to be used to distinguish other subpopulations of patients where discriminatory ICD codes are not available, permitting more detailed insights and new lines of research.

Keywords: Electronic health records; Hemodialysis; Machine learning; Medications; Pediatrics.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Flowchart of ML methods. Cylinders represent datasets and rectangles processing steps. The dark grey box includes all steps in the tuning process where model parameters are selected. The light grey box includes steps in the validation process used to prevent overfitting. The box with a dashed outline includes the development of the final model using the entire dataset and the selected model parameters.
Fig. 2.
Fig. 2.
Feature importance plot based on the random forest model. The width of the bars indicates the reduction in model predictive skill when permuting the values of that feature (medications and age).
Fig. 3.
Fig. 3.
Partial dependency of dialysis type based on patient age. Left panel indicates the probability of iHD with age; right panel the probability of CRRT. Note the marked transition to increased probability of iHD for patients above 1 year old.
Fig. 4.
Fig. 4.
Surrogate decision tree of top 4 deciding features based on the random forest model.

References

    1. Henry J, Pylypchuk Y, Searcy T, Patel V. Adoption of electronic health record systems among U.S. Non-federal acute care hospitals: 2008-2015. In: Office of the National Coordinator for Health Information Technology, editor35. Washington D.C.: ONC Data Brief; 2016.
    1. Roth JA, et al. Introduction to machine learning in digital healthcare epidemiology. Infect Control Hosp Epidemiol 2018;39(12):1457–62. - PubMed
    1. Kim HS, Kim DJ, Yoon KH. Medical big data is not yet available: why we need realism rather than exaggeration. Endocrinol Metab (Seoul) 2019;34(4):349–54. - PMC - PubMed
    1. Hersh WR, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care 2013;51(8 Suppl 3):S30–7. - PMC - PubMed
    1. Laper SM, Restrepo NA, Crawford DC. The challenges in using electronic health records for pharmacogenomics and precision medicine research. Pac Symp Biocomput 2016;21:369–80. - PMC - PubMed

LinkOut - more resources