Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 11;10(20):e39205.
doi: 10.1016/j.heliyon.2024.e39205. eCollection 2024 Oct 30.

SmartScanPCOS: A feature-driven approach to cutting-edge prediction of Polycystic Ovary Syndrome using Machine Learning and Explainable Artificial Intelligence

Affiliations

SmartScanPCOS: A feature-driven approach to cutting-edge prediction of Polycystic Ovary Syndrome using Machine Learning and Explainable Artificial Intelligence

Umaa Mahesswari G et al. Heliyon. .

Abstract

PolyCystic Ovarian Syndrome (PCOS) poses significant challenges to women's reproductive health due to its diagnostic complexity arising from a variety of symptoms, including hirsutism, anovulation, pain, obesity, hyperandrogenism, and oligomenorrhea, necessitating multiple clinical tests. Leveraging Artificial Intelligence (AI) in healthcare offers several benefits that can significantly impact patient care, streamline operations, and improve medical outcomes overall. This study presents an Explainable Artificial Intelligence (XAI)-driven PCOS smart predictor, structured as a hierarchical ensemble consisting of two tiers of Random Forest classifiers following extensive analysis of seven conventional classifiers and two additional stacking ensemble classifiers. An open-source data set comprising numerical parametric features linked to PCOS for classifier training was used. Moreover, to identify essential features for PCOS prediction three feature selection methods: Threshold-driven Optimized Principal Component Analysis (TOPCA), Optimized Salp Swarm (OSSM), and Threshold-driven Optimized Mutual Information Method (TOMIM) were fine-tuned through thresholding and improvisation to detect diverse attribute sets with varying numbers and combinations. Notably, the two-level Random Forest classifier model outperformed others with a remarkable 99.31 % accuracy by employing the top 17 features selected through the Threshold-driven Optimized Mutual Information Method (TOMIM) along with anoverallaccuracy of 99.32 % with 8 fold cross validation for 25 runs. The Smart predictor, constructed using Shapash - a Python library for Explainable Artificial Intelligence - was utilized to deploy the two-level Random Forest classifier model. Ensuring transparency and result reliability, visualizations from robust Explainable AI libraries were employed at different prediction stages for all considered classifiers in this study.

Keywords: Classification; Cross validation; Ensemble model; Explainable artificial intelligence; Health care; Machine learning; Polycystic ovarian syndrome (PCOS); eXplainable artificial intelligence (XAI).

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Image 1
Overview of SmartScanPCOS methodology.
Fig. 1
Fig. 1
Detailed research framework architecture.
Fig. 2
Fig. 2
Feature Distribution of significant numerical and categorical features.
Fig. 3
Fig. 3
Features selection and ranking by the proposed Threshold-driven Optimal Mutual Information method.
Fig. 4
Fig. 4
Architecture of the chosen Traditional classifiers and XAI based Explanation methods.
Fig. 5
Fig. 5
SHAP violin plot.
Fig. 6
Fig. 6
SHAP Waterfall plot.
Fig. 7
Fig. 7
(a) QLattice plot (b) Training ROC curve and confusion matrix (c) Testing ROC curve and confusion matrix of the generated predictive model.
Fig. 8
Fig. 8
XGBoost Model (a) Feature Variable Importance Plot (b) Break Down Plot for the chosen local explanation.
Fig. 9
Fig. 9
LIME plot for PCOS - positive case instance.
Fig. 10
Fig. 10
Architecture of the chosen ensembles and XAI based Explanation methods.
Fig. 11
Fig. 11
PDP Interaction plot between Follicle No. (R) and Cycle length (days) based on stacking ensemble 1.
Fig. 12
Fig. 12
PDP target plot between Follicle No. (R) and Cycle length (days) based on stacking ensemble 1.
Fig. 13
Fig. 13
PDP plots for stacking ensemble 2 - Prediction plot for feature Follicle No. (R).
Fig. 14
Fig. 14
Architecture of the proposed PCOS Smart Predictor.
Fig. 15
Fig. 15
Comparative accuracy analysis of the models using TOPCA, OSSM and TOMIM methods.

References

    1. Muslim Md, Zakwan Mohd, et al. Correlation between anti-mullerian hormone with insulin resistance in polycystic ovarian syndrome: a systematic review and meta-analysis. J. Ovarian Res. 2024;17(1):106. - PMC - PubMed
    1. Stener-Victorin Elisabet, et al. Polycystic ovary syndrome. Nat. Rev. Dis. Prim. 2024;10(1):27. - PubMed
    1. Salari Nader, et al. Global prevalence of polycystic ovary syndrome in women worldwide: a comprehensive systematic review and meta-analysis. Arch. Gynecol. Obstet. 2024:1–12. - PubMed
    1. Suma K.G. 2022. INDIA’S PROGRAMMATIC APPROACH FOR ENHANCING THE ADOLESCENT REPRODUCTIVE AND SEXUAL HEALTH.
    1. Hajam Younis Ahmad, et al. A review on critical appraisal and pathogenesis of polycystic ovarian syndrome. Endocrine and Metabolic Science. 2024

LinkOut - more resources