Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul:134:104430.
doi: 10.1016/j.compbiomed.2021.104430. Epub 2021 May 7.

Tensor learning of pointwise mutual information from EHR data for early prediction of sepsis

Affiliations

Tensor learning of pointwise mutual information from EHR data for early prediction of sepsis

Naimahmed Nesaragi et al. Comput Biol Med. 2021 Jul.

Abstract

Early detection of sepsis can facilitate early clinical intervention with effective treatment and may reduce sepsis mortality rates. In view of this, machine learning-based automated diagnosis of sepsis using easily recordable physiological data can be more promising as compared to the gold standard rule-based clinical criteria in current practice. This study aims to develop such a machine learning framework that demonstrates the quantification of heterogeneity within the tabular electronic health records (EHR) data of clinical covariates to capture both linear relationships and nonlinear correlation for the early prediction of sepsis. Here, the statistics of pairwise association for each hour-covariate pair within the EHR data for every 6-hours window-duration with selected 24 covariates is described using pointwise mutual information (PMI) matrix. This matrix gives the heterogeneity of data as a two-dimensional map. Such matrices are fused horizontally along the z-axis as vertical slices in the xy plane to form a 3-way tensor for each record with the corresponding Length of Stay (L). Tensor factorization of such fused tensor for every record is performed using Tucker decomposition, and only the core tensors are retained later, excluding the 3 unitary matrices to provide the latent feature set for the prediction of sepsis onset. A five-fold cross-validation scheme is employed wherein the obtained 120 latent features from the reshaped core tensor, are fed to Light Gradient Boosting Machine Learning models (LightGBM) for binary classification, further alleviating the involved class imbalance. The machine-learning framework is designed via Bayesian optimization, yielding an average normalized utility score of 0.4519 as defined by challenge organizers and area under the receiver operating characteristic curve (AUROC) of 0.8621 on publicly available PhysioNet/Computing in Cardiology Challenge 2019 training data. The proposed tensor decomposition of 3-way fused tensor formulated using PMI matrices leverages higher-order temporal interactions between the pairwise associations among the clinical values for early prediction of sepsis. This is validated with improved risk prediction power for every hour of admission to the ICU in terms of utility score, AUROC, and F1 score. The results obtained show a significant improvement particularly in terms of utility score of ~1.5-2% under a 5-fold cross-validation scheme on entire training data as compared to a top entrant research study that participated in the challenge.

Keywords: Early prediction; Electronic health records; Machine learning; Medical informatics; Model-based diagnosis; Pointwise mutual information; Sepsis; Tensor factorization.

PubMed Disclaimer

Similar articles

Cited by

LinkOut - more resources