Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 20:27:e70140.
doi: 10.2196/70140.

Deep Phenotyping of Obesity: Electronic Health Record-Based Temporal Modeling Study

Affiliations

Deep Phenotyping of Obesity: Electronic Health Record-Based Temporal Modeling Study

Xiaoyang Ruan et al. J Med Internet Res. .

Abstract

Background: Obesity affects approximately 40% of adults and 15%-20% of children and adolescents in the United States, and poses significant economic and psychosocial burdens. Currently, patient responses to any single antiobesity medication (AOM) vary significantly, making obesity deep phenotyping and associated precision medicine important targets of investigation.

Objective: This study aimed to evaluate the potential of electronic health records (EHR) as a primary data source for obesity deep phenotyping. We conducted an in-depth analysis of the data elements and quality available from obesity patients prior to pharmacotherapy and applied a multimodal longitudinal deep autoencoder to investigate the feasibility, data requirements, clustering patterns, and challenges associated with EHR-based obesity deep phenotyping.

Methods: We analyzed 53,688 pre-AOM periods from 32,969 patients with obesity or overweight who underwent medium- to long-term AOM treatment. A total of 92 laboratory and vital measurements, along with 79 ICD (International Classification of Diseases)-derived clinical classifications software (CCS) codes recorded within one year prior to AOM treatment, were used to train a gated recurrent unit with decay-based longitudinal autoencoder (GRU-D-AE) to generate dense embeddings for each pre-AOM record. Principal component analysis and Gaussian mixture modeling (GMM) were applied to identify clusters.

Results: Our analysis identified at least 9 clusters, with 5 exhibiting distinct and explainable clinical relevance. Certain clusters show characteristics overlapping with phenotypes from traditional phenotyping strategy. Results from multiple training folds demonstrated stable clustering patterns in 2D space and reproducible clinical significance. However, challenges persist regarding the stability of missing data imputation across folds, maintaining consistency in input features, and effectively visualizing complex diseases in low-dimensional spaces.

Conclusions: In this proof-of-concept study, we demonstrated longitudinal EHR as a valuable resource for deep phenotyping the pre-AOM period at per patient visit level. Our analysis revealed the presence of clusters with distinct clinical significance, which could have implications in AOM treatment options. Further research using larger, independent cohorts is necessary to validate the reproducibility and clinical relevance of these clusters, uncover more detailed substructures and corresponding AOM treatment responses.

Keywords: EHR; anti-obesity medication; obesity; phenotyping; precision medicine.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1.
Figure 1.. (A) GRU-D Autoencoder (GRU-D-AE) architecture. xt is the normalized feature value mt is the missing indicator (0 for missing and 1 for presence), and dt is the time since the last actual observation. The bottleneck layer hT contains the dense embedding vector used for clustering. (B) Sample processing flow and the generation of Principal Components (PCs) from longitudinal electronic health record. AOM: antiobesity medication; RNN: recurrent neural network; UT: University of Texas.
Figure 2.
Figure 2.. Feature presence rate of the top 10, (A) Measurements, (B) Clinical Classification Software (CCS) codes appeared within 1 year before and after initiating medium to long term Anti-Obesity Medication (AOM) therapy.
Figure 3.
Figure 3.. GRU-D autoencoder (GRU-D-AE) based clustering of case pre-Anti-Obesity Medication (pre-AOM) periods. (A) Coloring all pre-AOM periods according to data quality quartiles. (B) Low-quality data points (below the median) are removed, with the remaining points colored by data quality quartiles. (C) Colored by Gaussian Mixture Model (GMM)-based clustering of the high-quality data points (Principal component analysis (PCA) plot was dimmed to highlight cluster center).
Figure 4.
Figure 4.. Clusters of pre-Anti-Obesity medication (pre-AOM) periods in obesity cases versus normal Body Mass Index (BMI) controls. (A) All case data points versus control data points. (B) High quality case data points versus control data points.
Figure 5.
Figure 5.. Illustration of three patients, each with multiple Anti-Obesity Medication (AOM) sessions and respective pre-Anti-Obesity Medication (pre-AOM) periods. Smaller point sizes indicate pre-AOM periods earlier in time, while colors represent different data quality quartiles.
Figure 6.
Figure 6.. Two-way clustering of Clinical Classification Software (CCS) diagnosis prevalence rates against Gaussian mixture model (GMM)–based clusters of pre-Anti-Obesity Medication (pre-AOM) periods. Prevalence rates are calculated based on any diagnoses made within one year prior to the index date. Black circles indicate the defining characteristics of the corresponding clusters, which are referenced in the main text for guided interpretation.
Figure 7.
Figure 7.. Two-way clustering of mean measurement values (z-score transformed) against Gaussian mixture model (GMM)–based clusters of pre-Anti-Obesity Medication (pre-AOM) periods. The measurement values represent the most recent observations within one year prior to the index date. Black circles indicate the defining characteristics of the corresponding clusters, which are referenced in the main text for guided interpretation.

Similar articles

References

    1. Centers for disease control and prevention (CDC) Obesity and Severe Obesity Prevalence in Adults: United States, August 2021–August 2023. [15-08-2025]. https://www.cdc.gov/nchs/products/databriefs/db508.htm URL. Accessed.
    1. Sanyaolu A, Okorie C, Qi X, Locke J, Rehman S. Childhood and Adolescent Obesity in the United States: A Public Health Concern. Glob Pediatr Health. 2019;6:2333794X19891305. doi: 10.1177/2333794X19891305. doi. Medline. - DOI - PMC - PubMed
    1. Ward ZJ, Bleich SN, Cradock AL, et al. Projected U.S. State-Level Prevalence of Adult Obesity and Severe Obesity. N Engl J Med. 2019 Dec 19;381(25):2440–2450. doi: 10.1056/NEJMsa1909301. doi. Medline. - DOI - PubMed
    1. Stunkard AJ, Foch TT, Hrubec Z. A twin study of human obesity. JAMA. 1986 Jul 4;256(1):51–54. Medline. - PubMed
    1. Stunkard AJ, Sørensen TI, Hanis C, et al. An adoption study of human obesity. N Engl J Med. 1986 Jan 23;314(4):193–198. doi: 10.1056/NEJM198601233140401. doi. Medline. - DOI - PubMed

LinkOut - more resources