Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec:148:104547.
doi: 10.1016/j.jbi.2023.104547. Epub 2023 Nov 18.

A methodology of phenotyping ICU patients from EHR data: High-fidelity, personalized, and interpretable phenotypes estimation

Affiliations

A methodology of phenotyping ICU patients from EHR data: High-fidelity, personalized, and interpretable phenotypes estimation

Yanran Wang et al. J Biomed Inform. 2023 Dec.

Abstract

Objective: Computing phenotypes that provide high-fidelity, time-dependent characterizations and yield personalized interpretations is challenging, especially given the complexity of physiological and healthcare systems and clinical data quality. This paper develops a methodological pipeline to estimate unmeasured physiological parameters and produce high-fidelity, personalized phenotypes anchored to physiological mechanics from electronic health record (EHR).

Methods: A methodological phenotyping pipeline is developed that computes new phenotypes defined with unmeasurable computational biomarkers quantifying specific physiological properties in real time. Working within the inverse problem framework, this pipeline is applied to the glucose-insulin system for ICU patients using data assimilation to estimate an established mathematical physiological model with stochastic optimization. This produces physiological model parameter vectors of clinically unmeasured endocrine properties, here insulin secretion, clearance, and resistance, estimated for individual patient. These physiological parameter vectors are used as inputs to unsupervised machine learning methods to produce phenotypic labels and discrete physiological phenotypes. These phenotypes are inherently interpretable because they are based on parametric physiological descriptors. To establish potential clinical utility, the computed phenotypes are evaluated with external EHR data for consistency and reliability and with clinician face validation.

Results: The phenotype computation was performed on a cohort of 109 ICU patients who received no or short-acting insulin therapy, rendering continuous and discrete physiological phenotypes as specific computational biomarkers of unmeasured insulin secretion, clearance, and resistance on time windows of three days. Six, six, and five discrete phenotypes were found in the first, middle, and last three-day periods of ICU stays, respectively. Computed phenotypic labels were predictive with an average accuracy of 89%. External validation of discrete phenotypes showed coherence and consistency in clinically observable differences based on laboratory measurements and ICD 9/10 codes and clinical concordance from face validity. A particularly clinically impactful parameter, insulin secretion, had a concordance accuracy of 83%±27%.

Conclusion: The new physiological phenotypes computed with individual patient ICU data and defined by estimates of mechanistic model parameters have high physiological fidelity, are continuous, time-specific, personalized, interpretable, and predictive. This methodology is generalizable to other clinical and physiological settings and opens the door for discovering deeper physiological information to personalize medical care.

Keywords: Data assimilation; Data mining; Electronic health record; Knowledge representation with machine learning; Phenotyping; Physiological modeling.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. A.1.
Fig. A.1.
Fraction identification of Geweke Statistic. Using cohort A data, we ran 3 Markov chains for the parameter triplet (a1, Rg, Tp) then counted percentage of converged chains for each parameter using Geweke statistic with the initial fraction ranging from 0.05 to 0.1 and the terminal fraction fixed at 0.5. We chose the initial fraction as the first smallest value that decreased the convergence percentage for all parameters, hence we identified the initial fraction as 0.03. With this initial fraction, 47 out of 90 chains converged in Geweke statistic. Chains selection optima from cohort A were adopted for cohort B.
Fig. A.2.
Fig. A.2.
Step 1 (inset): Raw and un-clustered t-SNE dimension reduction of selected chains summary of 38 ICU stays in cohort B3 excluding T1DM using the same methodological choices. Step 2: Phenotype computation with SVC that clusters t-SNE coordinates of chains summaries. Circled coordinates are support vectors that lie on the feature space sphere boundaries. Four clusters were identified indicating four potential phenotypes.
Fig. 1.
Fig. 1.
The methodological pipeline flow chart with a high-level structural overview on the left, delineated as three main stages: (i) personalized parametric characterization using data assimilation (DA), (ii) continuous phenotype computation with physiological Interpretation using unsupervised ML and discrete phenotype rendering, and (iii) Internal evaluation, analysis, and external validation of discrete phenotypes. The first stage computes personal continuous estimates from parametric characterization based on a mathematical mechanistic model using DA. The second stage clusters a cohort’s of Individual estimates Into phenotypic groups that have well-resolved physiological Interpretations within each Identified group. The third stage Internally and externally validates the computer-derived phenotypes using laboratory measurements, ICD 9/10 codes, and face validity by an endocrinologist ICU expert.
Fig. 2.
Fig. 2.
The ultradian model diagram. The glucose-insulin system consists of three main pools: the plasma insulin, the remote insulin in the intercellular space, and the plasma glucose. This non-autonomous ultradian model is represented by a system of ODEs with 6 states and 21 parameters. Exponential constants a1 and C1 affect insulin secretion; tp is the time constant for plasma insulin degradation by the kidney and liver; Rg is the linear constant affecting insulin-dependent glucose production rate; C3 is the linear constant affecting remote insulin-dependent glucose removal rate in an implicit delayed process.
Fig. 3.
Fig. 3.
Markov chains selection to parametrically characterize a patient. Colored bubble charts of Markov chains selection for all 10 chains of one ICU stay from cohort A, with axes of parameter estimate, MSE, and Geweke statistics p-value, for each model parameter (A) a1, (B) tp, and (C) Rg separately. Each bubble representing a single Markov chain was labeled with chain number. Bubbles were drawn with radius corresponding to Markov chain IQR and colored blue to indicate Geweke convergence of that parameter. Markov chains with low MSE and intra-chain convergence of at least one parameter were selected. The seven Markov chains meeting these criteria were outlined in black and were selected to represent this ICU stay.
Fig. 4.
Fig. 4.
Step 1 (inset): Raw and un-clustered t-SNE dimension reduction of selected chains summary of 45 ICU stays in cohort B3 with specified methodological choices. Step 2: Colored phenotypic label computation with SVC that clusters t-SNE coordinates of chains summaries based on SVC methodological choices. Circled coordinates are support vectors mapped to the surface of the high dimensional feature space sphere. Five clusters were identified indicating five potential phenotypes.
Fig. 5.
Fig. 5.
Continuous parametric characterization of labeled phenotypes from SVC in cohort B3 with density plots in (A) 2D and (B) 3D median space. Clusters coloring was the same as Fig. 4. Both density plots in 2D and 3D median space showed five different distribution patterns over the entire parametric space and thus revealed five different phenotypes. For example, the phenotype in blue was interpreted as the phenotype with patients that had a1, the exponential constant for insulin secretion inverse rate, roughly between 4 and 5; tp, the time constant for plasma insulin clearance inverse rate by the kidney and liver, between 0.25 and 0.7 min; and Rg, the insulin-dependent glucose production rate or insulin resistance, between 300 and 550 mg/min.
Fig. 6.
Fig. 6.
Discrete phenotype contextual interpretation with corresponding synthesized description based on model-inferred phenotypic delineation. This figure is a hierarchical visualization of patient progression by delineating time-specific discrete phenotypes identified in cohort B3. This visualization is only one example of contextual interpretation in a clinically relevant way. Starting from left, we separated discrete phenotypes G8-G12 at each hierarchy based on model-inferred organ functionalities, and ended with the synthesized interpretations combining clinical observations on the right. Comparing the level of pancreatic insulin secretion functionality at the first hierarchy, phenotypes G8, G9, and G11 could be differentiated from G10 and G12. Then comparing the level of insulin clearance at the second hierarchy, G10 could be differentiated from G12. Similarly, comparing the level of insulin resistance at the second hierarchy, G11 could be differentiated from G8 and G9, followed by comparing the level of insulin clearance at the third hierarchy to differentiate G8 from G9.

Update of

Similar articles

Cited by

References

    1. Faruqui Syed Hasib Akhter, Du Yan, Meka Rajitha, Alaeddini Adel, Li Chengdong, Shirinkam Sara, Wang Jing, Development of a deep learning model for dynamic forecasting of blood glucose level for type 2 diabetes mellitus: Secondary analysis of a randomized controlled trial, JMIR Mhealth Uhealth (ISSN: 2291-5222) 7 (11) (2019) e14452, 10.2196/14452. - DOI - PMC - PubMed
    1. Zeevi David, Korem Tal, Zmora Niv, Israeli David, Rothschild Daphna, Weinberger Adina, Orly Ben-Yacov Dar Lador, Tali Avnit-Sagi Maya Lotan-Pompan, Suez Jotham, Jemal Ali Mahdi Elad Matot, Malka Gal, Kosower Noa, Rein Michal, Gili Zilberman-Schapira Lenka Dohnalová, Meirav Pevsner-Fischer Rony Bikovsky, Halpern Zamir, Elinav Eran, Segal Eran, Personalized nutrition by prediction of glycemic responses, Cell (ISSN: 0092-8674) 163 (5) (2015) 1079–1094, 10.1016/j.cell.2015.11.001, OA status: bronze. - DOI - PubMed
    1. Blei David M., Ng Andrew Y., Jordan Michael I., Latent dirichlet allocation, J. Mach. Learn. Res (ISSN: 1532-4435) 3 (2003) 993–1022, URL: https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf?ref=https://gith....
    1. Saria Suchi, Roller Daphne, Penn Anna, Learning individual and population level traits from clinical temporal data, in: Proceedings of Neural Information Processing Systems, Citeseer, pp. 1–9.
    1. Richesson Rachel L, Rusincovitch Shelley A, Wixted Douglas, Batch Bryan C, Feinglos Mark N, Miranda Marie Lynn, Hammond W Ed, Califf Robert M, Spratt Susan E, A comparison of phenotype definitions for diabetes mellitus, J. Am. Med. Inform. Assoc (ISSN: 1067-5027) 20 (e2) (2013) e319–e326, 10.1136/amiajnl-2013-001952, URL:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861928, OA status: green_published. - DOI - PMC - PubMed

Publication types

LinkOut - more resources