Comparative Study

. 2025 Oct 1:9:e70107.

doi: 10.2196/70107.

Application of Large Language Models in Data Analysis and Medical Education for Assisted Reproductive Technology: Comparative Study

Noriyuki Okuyama^#¹, Mika Ishii¹, Yuriko Fukuoka¹, Hiromitsu Hattori^{1

2

3}, Yuta Kasahara^{1

2

3}, Tai Toshihiro^{1

2}, Koki Yoshinaga¹, Tomoko Hashimoto¹, Koichi Kyono^#^{1

2

3}

Affiliations

¹ Kyono ART Clinic Takanawa, Takanawa Court 5F, 3-13-1 Takanawa, Minato-ku, Tokyo, 108-0074, Japan, 81 364084708, 81 364084702.
² Kyono ART Clinic Sendai, Sendai, Japan.
³ Kyono ART Clinic Morioka, Morioka, Japan.

^# Contributed equally.

PMID: 41032884
PMCID: PMC12488165
DOI: 10.2196/70107

Comparative Study

Application of Large Language Models in Data Analysis and Medical Education for Assisted Reproductive Technology: Comparative Study

Noriyuki Okuyama et al. JMIR Form Res. 2025.

. 2025 Oct 1:9:e70107.

doi: 10.2196/70107.

Authors

Noriyuki Okuyama^#¹, Mika Ishii¹, Yuriko Fukuoka¹, Hiromitsu Hattori^{1

2

3}, Yuta Kasahara^{1

2

3}, Tai Toshihiro^{1

2}, Koki Yoshinaga¹, Tomoko Hashimoto¹, Koichi Kyono^#^{1

2

3}

Affiliations

¹ Kyono ART Clinic Takanawa, Takanawa Court 5F, 3-13-1 Takanawa, Minato-ku, Tokyo, 108-0074, Japan, 81 364084708, 81 364084702.
² Kyono ART Clinic Sendai, Sendai, Japan.
³ Kyono ART Clinic Morioka, Morioka, Japan.

^# Contributed equally.

PMID: 41032884
PMCID: PMC12488165
DOI: 10.2196/70107

Abstract

Background: Recent studies have demonstrated that large language models exhibit exceptional performance in medical examinations. However, there is a lack of reports assessing their capabilities in specific domains or their application in practical data analysis using code interpreters. Furthermore, comparative analyses across different large language models have not been extensively conducted.

Objective: The purpose of this study was to evaluate whether advanced artificial intelligence (AI) models can analyze data from template-based input and demonstrate basic knowledge of reproductive medicine. Four AI models (GPT-4, GPT-4o, Claude 3.5 Sonnet, and Gemini Pro 1.5) were evaluated for their data analytical capabilities through numerical calculations and graph rendering. Their knowledge of infertility treatment was assessed using 10 examination questions developed by experts.

Methods: First, we uploaded data to the AI models and furnished instruction templates using the chat interface. The study investigated whether the AI models could perform pregnancy rate analysis and graph rendering, based on blastocyst grades according to Gardner criteria. Second, we assessed model diagnostic capabilities based on specialized knowledge. This evaluation used 10 questions derived from the Japanese Fertility Specialist Examination and the Embryologist Certification Exam, along with chromosome imaging. These materials were curated under the supervision of certified embryologists and fertility specialists. All procedures were repeated 10 times per AI model.

Results: GPT-4o achieved grade A output (defined as achieving the objective with a single output attempt) in 9 out of 10 trials, outperforming GPT-4, which achieved grade A in 7 out of 10. The average processing times for data analysis were 26.8 (SD 3.7) seconds for GPT-4o and 36.7 (SD 3) seconds for GPT-4, whereas Claude failed in all 10 attempts. Gemini achieved an average processing time of 23 (SD 3) seconds and received grade A in 6 out of 10 trials, though occasional manual corrections were needed. Embryologists required an average of 358.3 (SD 9.7) seconds for the same tasks. In the knowledge-based assessment, GPT-4o, Claude, and Gemini achieved perfect scores (9/9) on multiple-choice questions, while GPT-4 showed a 60% (6/10) success rate on 1 question. None of the AI models could reliably diagnose chromosomal abnormalities from karyotype images, with the highest image diagnostic accuracy being 70% (7/10) for Claude and Gemini.

Conclusions: This rapid processing demonstrates the potential for these AI models to significantly expedite data-intensive tasks in clinical settings. This performance underscores their potential utility as educational tools or decision support systems in reproductive medicine. However, none of the models were able to accurately interpret and diagnose using medical images.

Keywords: artificial intelligence; data analysis; education; infertility; large language model.

© Noriyuki Okuyama, Mika Ishii, Yuriko Fukuoka, Hiromitsu Hattori, Yuta Kasahara, Tai Toshihiro, Koki Yoshinaga, Tomoko Hashimoto, Koichi Kyono. Originally published in JMIR Formative Research (https://formative.jmir.org).

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1.. Data analysis procedure and evaluation framework for large language models in assisted reproductive technology clinical data processing. (A) Sample dataset structure showing patient treatment data from frozen-thawed embryo transfer cycles (January 2017-July 2024; 5361 cycles from 2276 patients) formatted for artificial intelligence (AI) model input with variables including patient age, embryo quality, and pregnancy outcomes. (B) Study workflow showing systematic evaluation protocol where 4 AI models (GPT-4, GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro) were tested 10 times each using standardized template prompts, with performance graded as A, B, or C. (C) Target visualization output showing pregnancy rates stratified by Gardner criteria trophectoderm grades (AA, BA, AB, BB, and BC) from clinic data. ET: embryo transfer; GS: gestational sac.

Figure 2.. Knowledge assessment framework for large language models in reproductive medicine education. (A) Study protocol for evaluating artificial intelligence (AI) model performance on fertility specialist examination questions, with 10 independent trials per model using fresh chat sessions. (B) Question distribution showing sources: 3 questions from a senior embryologist, 6 questions from board-certified specialists (Japan Society for Reproductive Medicine gynecology and urology specialist exams, 2016‐2018), and 1 image-based karyotype diagnosis question. (C) Karyotype analysis test image (600×450 pixels) used for chromosomal abnormality diagnosis assessment across all AI models.

See this image and copyright information in PMC

References

1. Miyazaki K, Sato R. Analyses of the technological accumulation over the 2nd and the 3rd AI boom and the issues related to AI adoption by firms. 2018 Portland International Conference on Management of Engineering and Technology (PICMET); Aug 19-23, 2018; Honolulu, HI. pp. 1–7. Presented at. doi. - DOI
1. Tang D. What is digital transformation? EDPACS. 2021 Jun 3;64(1):9–13. doi: 10.1080/07366981.2020.1847813. doi. - DOI
1. Abd-rabo AM, Hashaikeh SA. The digital transformation revolution. Int J Humanit Educ Res. 2021;3(4):124–128. doi: 10.47832/2757-5403.4-3.11. doi. - DOI
1. Althubaiti A, Tirksstani JM, Alsehaibany AA, Aljedani RS, Mutairii AM, Alghamdi NA. Digital transformation in medical education: factors that influence readiness. Health Informatics J. 2022;28(1):14604582221075554. doi: 10.1177/14604582221075554. doi. Medline. - DOI - PubMed
1. Macdonald C, Adeloye D, Sheikh A, Rudan I. Can ChatGPT draft a research article? An example of population-level vaccine effectiveness analysis. J Glob Health. 2023 Feb 17;13:01003. doi: 10.7189/jogh.13.01003. doi. Medline. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Application of Large Language Models in Data Analysis and Medical Education for Assisted Reproductive Technology: Comparative Study

Affiliations

Application of Large Language Models in Data Analysis and Medical Education for Assisted Reproductive Technology: Comparative Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical