Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 30:15:1298628.
doi: 10.3389/fendo.2024.1298628. eCollection 2024.

Predicting polycystic ovary syndrome with machine learning algorithms from electronic health records

Affiliations

Predicting polycystic ovary syndrome with machine learning algorithms from electronic health records

Zahra Zad et al. Front Endocrinol (Lausanne). .

Abstract

Introduction: Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis.

Methods: This is a retrospective cohort study from a SafetyNet hospital's electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound.

Results: We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG.

Conclusion: Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.

Keywords: artificial intelligence; disease prediction; machine learning; polycystic ovary syndrome (PCOS); predictive model.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Flow of patients from the BMC CDW into the dataset used by the study.
Figure 2
Figure 2
Feature importance graphs based on logistic regression coefficients (± 95% confidence interval), associated with parsimonious models utilizing the MLP score (LR-L2-MLP score). The absolute value of the logistic regression coefficients shows how much the variable affects the predicted probability of the outcome. A positive/negative coefficient implies that the larger the absolute value of the variable, the higher/lower the chance of having a PCOS diagnosis as defined by the model outcome.
Figure 3
Figure 3
Example of receiver operator characteristic (ROC) curves associated with parsimonious logistic regression models utilizing the MLP score (LR-L2-MLP score).

Update of

Similar articles

Cited by

References

    1. Azziz R, Carmina E, Dewailly D, Diamanti-Kandarakis E, Escobar-Morreale HF, Futterweit W, et al. . The Androgen Excess and PCOS Society criteria for the polycystic ovary syndrome: the complete task force report. Fertil Steril (2009) 91(2):456–88. doi: 10.1016/j.fertnstert.2008.06.035 - DOI - PubMed
    1. Riestenberg C, Jagasia A, Markovic D, Buyalos RP, Azziz R. Health care-related economic burden of polycystic ovary syndrome in the United States: pregnancy-related and long-term health consequences. J Clin Endocrinol Metab (2022) 107(2):575–85. doi: 10.1210/clinem/dgab613 - DOI - PubMed
    1. Sirmans SM, Pate KA. Epidemiology, diagnosis, and management of polycystic ovary syndrome. Clin Epidemiol (2013) 6:1–13. doi: 10.2147/CLEP.S37559 - DOI - PMC - PubMed
    1. Barry JA, Azizia MM, Hardiman PJ. Risk of endometrial, ovarian and breast cancer in women with polycystic ovary syndrome: a systematic review and meta-analysis. Hum Reprod Update (2014) 20(5):748–58. doi: 10.1093/humupd/dmu012 - DOI - PMC - PubMed
    1. Lim SS, Kakoly NS, Tan JWJ, Fitzgerald G, Bahri Khomami M, Joham AE, et al. . Metabolic syndrome in polycystic ovary syndrome: a systematic review, meta-analysis and meta-regression. Obes Rev (2019) 20(2):339–52. doi: 10.1111/obr.12762 - DOI - PubMed

Publication types

Substances