Can OpenAI's New o1 Model Outperform Its Predecessors in Common Eye Care Queries?

Krithi Pushpanathan^{1

2}, Minjie Zou^{1

2}, Sahana Srinivasan^{1

2}, Wendy Meihua Wong^{1

2

3}, Erlangga Ariadarma Mangunkusumo^{1

2

3}, George Naveen Thomas^{1

2

3}, Yien Lai^{1

2

3}, Chen-Hsin Sun^{1

2

3}, Janice Sing Harn Lam^{1

2

3}, Marcus Chun Jin Tan^{1

2

3}, Hazel Anne Hui'En Lin^{1

2

3}, Weizhi Ma⁴, Victor Teck Chang Koh^{1

2

3}, David Ziyou Chen^{1

2

3}, Yih-Chung Tham^{1

2

5

6}

Affiliations

¹ Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
² Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore.
³ Department of Ophthalmology, National University Hospital, Singapore.
⁴ Institute for AI Industry Research, Tsinghua University, Beijing, China.
⁵ Singapore Eye Research Institute, Singapore National Eye Centre, Singapore.
⁶ Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore.

PMID: 40291392
PMCID: PMC12022690
DOI: 10.1016/j.xops.2025.100745

Can OpenAI's New o1 Model Outperform Its Predecessors in Common Eye Care Queries?

Krithi Pushpanathan et al. Ophthalmol Sci. 2025.

. 2025 Feb 22;5(4):100745.

doi: 10.1016/j.xops.2025.100745. eCollection 2025 Jul-Aug.

Authors

Affiliations

¹ Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore.
² Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore.
³ Department of Ophthalmology, National University Hospital, Singapore.
⁴ Institute for AI Industry Research, Tsinghua University, Beijing, China.
⁵ Singapore Eye Research Institute, Singapore National Eye Centre, Singapore.
⁶ Eye Academic Clinical Program (Eye ACP), Duke NUS Medical School, Singapore.

PMID: 40291392
PMCID: PMC12022690
DOI: 10.1016/j.xops.2025.100745

Abstract

Objective: The newly launched OpenAI o1 is said to offer improved reasoning, potentially providing higher quality responses to eye care queries. However, its performance remains unassessed. We evaluated the performance of o1, ChatGPT-4o, and ChatGPT-4 in addressing ophthalmic-related queries, focusing on correctness, completeness, and readability.

Design: Cross-sectional study.

Subjects: Sixteen queries, previously identified as suboptimally responded to by ChatGPT-4 from prior studies, were used, covering 3 subtopics: myopia (6 questions), ocular symptoms (4 questions), and retinal conditions (6 questions).

Methods: For each subtopic, 3 attending-level ophthalmologists, masked to the model sources, evaluated the responses based on correctness, completeness, and readability (on a 5-point scale for each metric).

Main outcome measures: Mean summed scores of each model for correctness, completeness, and readability, rated on a 5-point scale (maximum score: 15).

Results: O1 scored highest in correctness (12.6) and readability (14.2), outperforming ChatGPT-4, which scored 10.3 (P = 0.010) and 12.4 (P < 0.001), respectively. No significant difference was found between o1 and ChatGPT-4o. When stratified by subtopics, o1 consistently demonstrated superior correctness and readability. In completeness, ChatGPT-4o achieved the highest score of 12.4, followed by o1 (10.8), though the difference was not statistically significant. o1 showed notable limitations in completeness for ocular symptom queries, scoring 5.5 out of 15.

Conclusions: While o1 is marketed as offering improved reasoning capabilities, its performance in addressing eye care queries does not significantly differ from its predecessor, ChatGPT-4o. Nevertheless, it surpasses ChatGPT-4, particularly in correctness and readability.

Financial disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Keywords: Large language models; Myopia; Ocular symptoms; OpenAI o1; Retinal conditions.

PubMed Disclaimer

Figures

**Figure 1**
Study design flowchart. LLMs = large language models.

**Figure 2**
Bar charts comparing mean summed scores for correctness, completeness, and readability across ChatGPT-4, ChatGPT-4o, and OpenAI o1, evaluated on **(A)** overall performance across 16 eye care inquiries, **(B)** inquiries related to myopia, **(C)** inquiries concerning ocular symptoms, and **(D)** inquiries about retinal conditions. ∗Adjusted P < 0.05, ∗∗P < 0.001 for Dunn test conducted for multiple hypothesis comparisons. Statistical significance testing was not conducted in the topic-specific categories due to small sample sizes.

See this image and copyright information in PMC

References

1. Tamkin A., Brundage M., Clark J., Ganguli D. Understanding the capabilities, limitations, and societal impact of large language models. ArXiv. 2021 doi: 10.48550/arXiv.2102.02503. - DOI
1. Singhal K., Azizi S., Tu T., et al. Large language models encode clinical knowledge. Nature. 2023;620:172–180. - PMC - PubMed
1. Clusmann J., Kolbinger F.R., Muti H.S., et al. The future landscape of large language models in medicine. Commun Med. 2023;3:141. - PMC - PubMed
1. Thirunavukarasu A.J., Ting D.S.J., Elangovan K., et al. Large language models in medicine. Nat Med. 2023;29:1930–1940. - PubMed
1. Sandmann S., Riepenhausen S., Plagwitz L., Varghese J. Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks. Nat Commun. 2024;15:2050. - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Elsevier Science
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Can OpenAI's New o1 Model Outperform Its Predecessors in Common Eye Care Queries?

Affiliations

Can OpenAI's New o1 Model Outperform Its Predecessors in Common Eye Care Queries?

Authors

Affiliations

Abstract

Figures

References

LinkOut - more resources

Full Text Sources