Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr;5(1):e2371.
doi: 10.52225/narra.v5i1.2371. Epub 2025 Apr 8.

Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English

Affiliations

Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English

Malik Sallam et al. Narra J. 2025 Apr.

Abstract

The rapid evolution of generative artificial intelligence (genAI) has ushered in a new era of digital medical consultations, with patients turning to AI-driven tools for guidance. The emergence of Chinese-developed genAI models such as DeepSeek-R1 and Qwen-2.5 presented a challenge to the dominance of OpenAI's ChatGPT. The aim of this study was to benchmark the performance of Chinese genAI models against ChatGPT-40 and to assess disparities in performance across English and Arabic. Following the METRICS checklist for genAI evaluation, Qwen-2.5, DeepSeek-R1, and ChatGPT-40 were assessed for completeness, accuracy, and relevance using the CLEAR tool in common patient ophthalmology queries. In English, Qwen-2.5 demonstrated the highest overall performance (CLEAR score: 4.43 ± 0.28), outperforming both DeepSeek-R1 (4.3 ± 0.43) and ChatGPT-40 (4.14 ± 0.41), with p = 0.002. A similar hierarchy emerged in Arabic, with Qwen-2.5 again leading (4.40 ± 0.29), followed by DeepSeek-R1 (4.20 ± 0.49) and ChatGPT-40 (4.14 ± 0.41), with p = 0.007. Each tested genAI model exhibited near-identical performance across the two languages, with ChatGPT-40 demonstrating the most balanced linguistic capabilities (p = 0.957), while Qwen-2.5 and DeepSeek-R1 showed a marginal superiority for English. An in-depth examination of genAI performance across key CLEAR components revealed that Qwen-2.5 consistently excelled in content completeness, factual accuracy, and relevance in both English and Arabic, setting a new benchmark for genAI in medical inquiries. Despite minor linguistic disparities, all three models exhibited robust multilingual capabilities, challenging the long-held assumption that genAI is inherently biased toward English. These findings highlight the evolving nature of AI-driven medical assistance, with Chinese genAI models being able to rival or even surpass ChatGPT-40 in ophthalmology-related queries.

Keywords: DeepSeek; LLM; OpenAI; Qwen; eye disease.

PubMed Disclaimer

Conflict of interest statement

All the authors declare that there are no conflicts of interest.

Figures

Figure 1.
Figure 1.
Comparison of generative AI (genAI) model performance in English and Arabic using CLEAR overall scores. The p-values were calculated using Kruskal-Wallis test.
Figure 2.
Figure 2.
Comparison of generative AI (genAI) model performance across ophthalmology query topics. p-values were calculated using Kruskal-Wallis test. Post-hoc analysis results using Mann- Whitney U tests are indicated by the horizontal lines between genAI models, with significant results indicated by asterisk, while statistically insignificant results are indicated by ns.

Similar articles

References

    1. The British Broadcasting Corporation (BBC) . AI named word of the year by Collins Dictionary. Available from: https://www.bbc.com/news/entertainment-arts-67271252. Accessed: 27 February 2025.
    1. Mbizo T, Oosterwyk G, Tsibolane P, et al. . Cautious optimism: The influence of generative AI tools in software development projects. In: Gerber A, editor. South African computer science and information systems research trends. Cham: Springer Nature Switzerland; 2024.
    1. Yusuf A, Pervin N, Roman-Gonzalez M. Generative AI and the future of higher education: A threat to academic integrity or reformation? Evidence from multicultural perspectives. Int J Educ Technol High Educ 2024;21(1):21.
    1. Cohen J, Lee G, Greenbaum L, et al. . The generative world order: AI, geopolitics, and power. Goldman Sachs 2023. Available from: https://www.goldmansachs.com/insights/articles/the-generative-world-orde... power. Accessed: 27 February 2025.
    1. Sallam M. ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns. Healthcare 2023;11(6):887. - PMC - PubMed

Supplementary concepts

LinkOut - more resources