Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments
- PMID: 39373739
- PMCID: PMC11458639
- DOI: 10.1007/s00784-024-05968-w
Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments
Abstract
Objectives: The advent of artificial intelligence (AI) and large language model (LLM)-based AI applications (LLMAs) has tremendous implications for our society. This study analyzed the performance of LLMAs on solving restorative dentistry and endodontics (RDE) student assessment questions.
Materials and methods: 151 questions from a RDE question pool were prepared for prompting using LLMAs from OpenAI (ChatGPT-3.5,-4.0 and -4.0o) and Google (Gemini 1.0). Multiple-choice questions were sorted into four question subcategories, entered into LLMAs and answers recorded for analysis. P-value and chi-square statistical analyses were performed using Python 3.9.16.
Results: The total answer accuracy of ChatGPT-4.0o was the highest, followed by ChatGPT-4.0, Gemini 1.0 and ChatGPT-3.5 (72%, 62%, 44% and 25%, respectively) with significant differences between all LLMAs except GPT-4.0 models. The performance on subcategories direct restorations and caries was the highest, followed by indirect restorations and endodontics.
Conclusions: Overall, there are large performance differences among LLMAs. Only the ChatGPT-4 models achieved a success ratio that could be used with caution to support the dental academic curriculum.
Clinical relevance: While LLMAs could support clinicians to answer dental field-related questions, this capacity depends strongly on the employed model. The most performant model ChatGPT-4.0o achieved acceptable accuracy rates in some subject sub-categories analyzed.
Keywords: Artificial intelligence; ChatGPT; Gemini; GenAI; Natural language processing.
© 2024. The Author(s).
Conflict of interest statement
No declared conflicts of interest exist among all authors of this study neither regarding authorship nor publication of this manuscript.
Figures
References
-
- OpenAI (2022) Introducing ChatGPT. https://openai.com/blog/chatgpt. Accessed May 1st 2024
-
- Google (2023) An important next step on our AI journey. https://blog.google/technology/ai/bard-google-ai-search-updates/. Accessed May 1st 2024
-
- Hoch CC, Wollenberg B, Luers JC, Knoedler S, Knoedler L, Frank K, Cotofana S, Alfertshofer M (2023) ChatGPT’s quiz skills in different otolaryngology subspecialties: an analysis of 2576 single-choice and multiple-choice board certification preparation questions. Eur Arch Otorhinolaryngol 280:4271–4278. 10.1007/s00405-023-08051-4 - DOI - PMC - PubMed
-
- Knoedler S, Sofo G, Kern B, Frank K, Cotofana S, von Isenburg S, Konneker S, Mazzarone F, Dorafshar AH, Knoedler L, Alfertshofer M (2024) Modern Machiavelli? The illusion of ChatGPT-generated patient reviews in plastic and aesthetic surgery based on 9000 review classifications. J Plast Reconstr Aesthet Surg 88:99–108. 10.1016/j.bjps.2023.10.119 - DOI - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
