Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct;20(4):2307-2314.
doi: 10.1016/j.jds.2025.05.012. Epub 2025 May 27.

Performance of artificial intelligence chatbots in National dental licensing examination

Affiliations

Performance of artificial intelligence chatbots in National dental licensing examination

Chad Chan-Chia Lin et al. J Dent Sci. 2025 Oct.

Abstract

Background/purpose: The Taiwan dental board exams comprehensively assess dental candidates across twenty distinct subjects, spanning foundational knowledge to clinical fields, using multiple-choice single-answer exams with a minimum passing score of 60 %. This study assesses the performance of artificial intelligence (AI)-powered chatbots (specifically ChatGPT3.5, Gemini, and Claude2), categorized as Large Language Models (LLMs), on these exams from 2021 to 2023.

Materials and methods: A total of 2699 multiple-choice questions spanning eight subjects in basic dentistry and twelve in clinical dentistry were analyzed. Questions involving images and tables were excluded. Statistical analyses were conducted using McNemar's test. Furthermore, annual results of LLMs were compared with the qualification rates of human candidates to provide additional context.

Results: Claude2 demonstrated the highest overall accuracy (54.89 %) on the Taiwan national dental licensing examinations, outperforming ChatGPT3.5 (49.33 %) and Gemini (44.63 %), with statistically significant differences in performance across models. In the basic dentistry domain, Claude2 scored 59.73 %, followed by ChatGPT3.5 (54.87 %) and Gemini (47.35 %). Notably, Claude2 excelled in biochemistry (73.81 %) and oral microbiology (88.89 %), while ChatGPT3.5 also performed strongly in oral microbiology (80.56 %). In the clinical dentistry domain, Claude2 led with a score of 52.45 %, surpassing ChatGPT3.5 (46.54 %) and Gemini (43.26 %), and showed strong results in dental public health (65.81 %). Despite these achievements, none of the LLMs attained passing scores overall.

Conclusion: None of the models achieved passing scores, highlighting their strengths in foundational knowledge but limitations in clinical reasoning.

Keywords: Artificial intelligence; Chatbot; Dental education; Dentistry; Large language models.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Flowchart of study design and setting.
Figure 2
Figure 2
Examples of question posed to each large language model (LLM), including the response from Chat Generative Pre-trained Transformer 3.5 (ChatGPT3.5, A and B), Gemini (C), and Claude2 (D).
Figure 3
Figure 3
Performance of ChatGPT3.5, Gemini, Claude2 on the Taiwan dental licensing examinations (A) Performance on Taiwan dental licensing examinations; Stage 2: Clinical dentistry; Stage 1: Basic dentistry. (B) Stage 1: Basic dentistry. (C) Stage 2: Clinical dentistry.

References

    1. Omiye J.A., Gui H., Rezaei S.J., Zou J., Daneshjou R. Large language models in medicine: the potentials and pitfalls: a narrative review. Ann Intern Med. 2024;177:210–220. - PubMed
    1. Sarraju A., Bruemmer D., Van Iterson E., Cho L., Rodriguez F., Laffin L. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA. 2023;329:842–844. - PMC - PubMed
    1. Grewal H., Dhillon G., Monga V., et al. Radiology gets chatty: the ChatGPT saga unfolds. Cureus. 2023;15 - PMC - PubMed
    1. Ali S.R., Dobbs T.D., Hutchings H.A., Whitaker I.S. Using ChatGPT to write patient clinic letters. Lancet Digit Health. 2023;5:e179–e181. - PubMed
    1. Sorin V., Klang E., Sklair-Levy M., et al. Large language model (ChatGPT) as a support tool for breast tumor board. NPJ Breast Cancer. 2023;9:44. - PMC - PubMed

LinkOut - more resources