Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 14:26:e48997.
doi: 10.2196/48997.

Five-Feature Models to Predict Preeclampsia Onset Time From Electronic Health Record Data: Development and Validation Study

Affiliations

Five-Feature Models to Predict Preeclampsia Onset Time From Electronic Health Record Data: Development and Validation Study

Hailey K Ballard et al. J Med Internet Res. .

Abstract

Background: Preeclampsia is a potentially fatal complication during pregnancy, characterized by high blood pressure and the presence of excessive proteins in the urine. Due to its complexity, the prediction of preeclampsia onset is often difficult and inaccurate.

Objective: This study aimed to create quantitative models to predict the onset gestational age of preeclampsia using electronic health records.

Methods: We retrospectively collected 1178 preeclamptic pregnancy records from the University of Michigan Health System as the discovery cohort, and 881 records from the University of Florida Health System as the validation cohort. We constructed 2 Cox-proportional hazards models: 1 baseline model using maternal and pregnancy characteristics, and the other full model with additional laboratory findings, vitals, and medications. We built the models using 80% of the discovery data, tested the remaining 20% of the discovery data, and validated with the University of Florida data. We further stratified the patients into high- and low-risk groups for preeclampsia onset risk assessment.

Results: The baseline model reached Concordance indices of 0.64 and 0.61 in the 20% testing data and the validation data, respectively, while the full model increased these Concordance indices to 0.69 and 0.61, respectively. For preeclampsia diagnosed at 34 weeks, the baseline and full models had area under the curve (AUC) values of 0.65 and 0.70, and AUC values of 0.69 and 0.70 for preeclampsia diagnosed at 37 weeks, respectively. Both models contain 5 selective features, among which the number of fetuses in the pregnancy, hypertension, and parity are shared between the 2 models with similar hazard ratios and significant P values. In the full model, maximum diastolic blood pressure in early pregnancy was the predominant feature.

Conclusions: Electronic health records data provide useful information to predict the gestational age of preeclampsia onset. Stratification of the cohorts using 5-predictor Cox-proportional hazards models provides clinicians with convenient tools to assess the onset time of preeclampsia in patients.

Keywords: EHR; electronic health records; health records; machine learning; maternal; mortality; preeclampsia; pregnancy; prognosis; risk; risk prediction; survival; survival analysis.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Study design and workflow for the University of Michigan preeclampsia cohort (N=1178) and the University of Florida preeclampsia cohort (N=881), 2015-2021. The discovery cohort was obtained from the University of Michigan Health System and a validation cohort of similar size and time was obtained from the University of Florida Health System. We constructed 2 preeclampsia predictive models: baseline and full model. The input variables in baseline models include patients’ demographics, lifestyle, comorbidities, and medical history (n=31) which were reduced to 5 features. The input for the full model includes additional lab tests and vital signs around preeclampsia diagnosis time, in addition to the variables in the baseline models (n=92), and was reduced to 5 features for the discovery cohort, and 4 features for the validation cohort. We trained the Cox-proportional hazards models with the Least Absolute Shrinkage and Selection Operator regularization, using 80% training from the University of Michigan discovery cohort. We tested it on 20% hold-out data from the same discovery cohort and validated it using the University of Florida validation cohort. Cox-PH: Cox proportional-hazard; LASSO: Least Absolute Shrinkage and Selection Operator; PE: preeclampsia; UF: University of Florida; UM: University of Michigan.
Figure 2
Figure 2
Gestational age of preeclampsia diagnosis baseline model features and performance. (A) Bar plot of hazard ratios of the selected features by Cox-proportional hazards method with Least Absolute Shrinkage and Selection Operator regularization. Ranging from smallest to largest hazard ratio: mood and anxiety disorder, diabetes, hypertension, parity, and number of fetuses. (B-D) Kaplan-Meier survival curves of high-risk (red) and low-risk (blue) pregnancies in the respective data sets, each with a log-rank test P value <.001. (B) University of Michigan training data set with a C-index of 0.62. (C) Hold-out testing set with a C-index of 0.64. (D) University of Florida validation data set with a C-index of 0.61.
Figure 3
Figure 3
Gestational age of preeclampsia diagnosis full model features and performance. (A) Bar plot of hazard ratios of the selected features in the full model by Cox-proportional hazards method with Least Absolute Shrinkage and Selection Operator regularization. Ranging from smallest to largest hazard ratio: nonsteroidal anti-inflammatory drug use, hypertension, parity, number of fetuses, and maximum diastolic blood pressure. (B) The bubble plot of significant features from preeclampsia baseline and full models. The size of the bubbles represents the hazard ratio of each feature. The number of fetuses, parity, and hypertension were shared between both models with similar hazard ratios. (C-E) Kaplan-Meier survival curves of high-risk (red) and low-risk (blue) pregnancies in the respective data sets, each with a log-rank test P value <.001. (C) University of Michigan training data set with a concordance index of 0.62. (D) Hold-out testing set with a concordance index of 0.64. (E) University of Florida validation data set with a concordance index of 0.61. BP: blood pressure.

References

    1. Young BC, Levine RJ, Karumanchi SA. Pathogenesis of preeclampsia. Annu Rev Pathol Mech Dis. 2010;5(1):173–192. doi: 10.1146/annurev-pathol-121808-102149. - DOI - PubMed
    1. Al-Jameil N, Aziz Khan F, Fareed Khan M, Tabassum H. A brief overview of preeclampsia. J Clin Med Res. 2014;6(1):1–7. doi: 10.4021/jocmr1682w. - DOI - PMC - PubMed
    1. Chappell LC, Duckworth S, Seed PT, Griffin M, Myers J, Mackillop L, Simpson N, Waugh J, Anumba D, Kenny LC, Redman CW, Shennan AH. Diagnostic accuracy of placental growth factor in women with suspected preeclampsia: a prospective multicenter study. Circulation. 2013;128(19):2121–2131. doi: 10.1161/CIRCULATIONAHA.113.003215.128/19/2121 - DOI - PubMed
    1. E. G, Akurati L, Radhika K. Early onset and late onset preeclampsia-maternal and perinatal outcomes in a rural teritiary health center. Int J Reprod Contracept Obstet Gynecol. 2018;7(6):2266–2269. doi: 10.18203/2320-1770.ijrcog20182333. - DOI
    1. Wainstock T, Sergienko R, Sheiner E. Who is at risk for preeclampsia? Risk factors for developing initial preeclampsia in a subsequent pregnancy. J Clin Med. 2020;9(4):1103. doi: 10.3390/jcm9041103. https://www.mdpi.com/resolver?pii=jcm9041103 jcm9041103 - DOI - PMC - PubMed

Publication types

LinkOut - more resources