Artificial intelligence in reproductive endocrinology: an in-depth longitudinal analysis of ChatGPTv4's month-by-month interpretation and adherence to clinical guidelines for diminished ovarian reserve
- PMID: 39341951
- DOI: 10.1007/s12020-024-04031-8
Artificial intelligence in reproductive endocrinology: an in-depth longitudinal analysis of ChatGPTv4's month-by-month interpretation and adherence to clinical guidelines for diminished ovarian reserve
Abstract
Objective: To quantitatively assess the performance of ChatGPTv4, an Artificial Intelligence Language Model, in adhering to clinical guidelines for Diminished Ovarian Reserve (DOR) over two months, evaluating the model's consistency in providing guideline-based responses.
Design: A longitudinal study design was employed to evaluate ChatGPTv4's response accuracy and completeness using a structured questionnaire at baseline and at a two-month follow-up.
Setting: ChatGPTv4 was tasked with interpreting DOR questionnaires based on standardized clinical guidelines.
Participants: The study did not involve human participants; the questionnaire was exclusively administered to the ChatGPT model to generate responses about DOR.
Methods: A guideline-based questionnaire with 176 open-ended, 166 multiple-choice, and 153 true/false questions were deployed to rigorously assess ChatGPTv4's ability to provide accurate medical advice aligned with current DOR clinical guidelines. AI-generated responses were rated on a 6-point Likert scale for accuracy and a 3-point scale for completeness. The two-phase design assessed the stability and consistency of AI-generated answers over two months.
Results: ChatGPTv4 achieved near-perfect scores across all question types, with true/false questions consistently answered with 100% accuracy. In multiple-choice queries, accuracy improved from 98.2 to 100% at the two-month follow-up. Open-ended question responses exhibited significant positive enhancements, with accuracy scores increasing from an average of 5.38 ± 0.71 to 5.74 ± 0.51 (max: 6.0) and completeness scores from 2.57 ± 0.52 to 2.85 ± 0.36 (max: 3.0). It underscored the improvements as significant (p < 0.001), with positive correlations between initial and follow-up accuracy (r = 0.597) and completeness (r = 0.381) scores.
Limitations: The study was limited by the reliance on a controlled, albeit simulated, setting that may not perfectly mirror real-world clinical interactions.
Conclusion: ChatGPTv4 demonstrated exceptional and improving accuracy and completeness in handling DOR-related guideline queries over the studied period. These findings highlight ChatGPTv4's potential as a reliable, adaptable AI tool in reproductive endocrinology, capable of augmenting clinical decision-making and guideline development.
Keywords: Artificial Intelligence; ChatGPTv4; Diminished ovarian reserve; Reproductive endocrinology.
© 2024. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Conflict of interest statement
Similar articles
-
Artificial intelligence and clinical guidance in male reproductive health: ChatGPT4.0's AUA/ASRM guideline compliance evaluation.Andrology. 2025 Feb;13(2):176-183. doi: 10.1111/andr.13693. Epub 2024 Jul 17. Andrology. 2025. PMID: 39016301
-
Assessment of ChatGPT's adherence to EULAR diagnostic criteria and therapeutic protocols for rheumatoid arthritis at two distinct time points, 14 days apart, utilizing binary and multiple-choice inquiries.Clin Rheumatol. 2025 Jun;44(6):2233-2239. doi: 10.1007/s10067-025-07417-9. Epub 2025 Apr 22. Clin Rheumatol. 2025. PMID: 40261586
-
Artificial Intelligence and Gynecological Oncology: A Comparative Study of ChatGPT Omni and Gemini Pro across Repeated Intervals with Case-Scenario and Open-Ended Queries.Oncol Res Treat. 2025;48(6):325-331. doi: 10.1159/000545231. Epub 2025 Mar 12. Oncol Res Treat. 2025. PMID: 40073851
-
Reproductive ovarian testing and the alphabet soup of diagnoses: DOR, POI, POF, POR, and FOR.J Assist Reprod Genet. 2018 Jan;35(1):17-23. doi: 10.1007/s10815-017-1058-4. Epub 2017 Oct 2. J Assist Reprod Genet. 2018. PMID: 28971280 Free PMC article. Review.
-
Questionnaires used to assess barriers of clinical guideline use among physicians are not comprehensive, reliable, or valid: a scoping review.J Clin Epidemiol. 2017 Jun;86:25-38. doi: 10.1016/j.jclinepi.2016.12.012. Epub 2017 Jan 17. J Clin Epidemiol. 2017. PMID: 28104462
References
-
- Q. Zhu, H. Ma, J. Wang, X. Liang, Understanding the mechanisms of diminished ovarian reserve: insights from genetic variants and regulatory factors. Reprod. Sci. 31, 1521–1532 (2024).
-
- K. Feng, Z. Zhang, L. Wu, L. Zhu, X. Li, D. Li, et al. Predictive factors for the formation of viable embryos in subfertile patients with diminished ovarian reserve: a clinical prediction study. Reprod. Sci. 31 (6) 1747–1756 (2024).
-
- Z. Tan, X. Gong, C.C. Wang, T. Zhang, J. Huang, Diminished ovarian reserve in endometriosis: insights from in vitro, in vivo, and human studies—a systematic review. Int. J. Mol. Sci. 24 (21) (2023).
MeSH terms
LinkOut - more resources
Full Text Sources