Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr;20(2):895-900.
doi: 10.1016/j.jds.2024.08.020. Epub 2024 Sep 11.

Can a large language model create acceptable dental board-style examination questions? A cross-sectional prospective study

Affiliations

Can a large language model create acceptable dental board-style examination questions? A cross-sectional prospective study

Hak-Sun Kim et al. J Dent Sci. 2025 Apr.

Abstract

Background/purpose: Numerous studies have shown that large language models (LLMs) can score above the passing grade on various board examinations. Therefore, this study aimed to evaluate national dental board-style examination questions created by an LLM versus those created by human experts using item analysis.

Materials and methods: This study was conducted in June 2024 and included senior dental students (n = 30) who participated voluntarily. An LLM, ChatGPT 4o, was used to generate 44 national dental board-style examination questions based on textbook content. Twenty questions for the LLM set were randomly selected after removing false questions. Two experts created another set of 20 questions based on the same content and in the same style as the LLM. Participating students simultaneously answered a total of 40 questions divided into two sets using Google Forms in the classroom. The responses were analyzed to assess difficulty, discrimination index, and distractor efficiency. Statistical comparisons were performed using the Wilcoxon signed rank test or linear-by-linear association test, with a confidence level of 95%.

Results: The response rate was 100%. The median difficulty indices of the LLM and human set were 55.00% and 50.00%, both within the range of "excellent" range. The median discrimination indices were 0.29 for the LLM set and 0.14 for the human set. Both sets had a median distractor efficiency of 80.00%. The differences in all criteria were not statistically significant (P > 0.050).

Conclusion: The LLM can create national board-style examination questions of equivalent quality to those created by human experts.

Keywords: Artificial intelligence; Dental education; Examination questions; Natural language processing; Professional competence.

PubMed Disclaimer

Conflict of interest statement

The author has no conflicts of interest relevant to this article.

Figures

Figure 1
Figure 1
Schematic diagram of the overall process of this study. LLM, large language model.
Figure 2
Figure 2
Example questions based on knowledge of the biological effects of ionizing radiation. (A) Large language model and (B) human sets.
Figure 3
Figure 3
Plots of discrimination indices (Y axis) against difficulty indices (X-axis). (A) Large language model set and (B) human sets.
Figure 4
Figure 4
Number of non-functioning distractors in large language model and human sets.

Similar articles

References

    1. De Fauw J., Ledsam J.R., Romera-Paredes B., et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24:1342–1350. - PubMed
    1. Lee C., Ha E.G., Choi Y.J., Jeon K.J., Han S.S. Synthesis of T2-weighted images from proton density images using a generative adversarial network in a temporomandibular joint magnetic resonance imaging protocol. Imaging Sci Dent. 2022;52:393–398. - PMC - PubMed
    1. Lampinen A.K., Dasgupta I., Chan S.C.Y., et al. Language models show human-like content effects on reasoning tasks. arXiv. 2022 2207.07051. - PMC - PubMed
    1. Kim H.S., Ha E.G., Kim Y.H., Jeon K.J., Lee C., Han S.S. Transfer learning in a deep convolutional neural network for implant fixture classification: a pilot study. Imaging Sci Dent. 2022;52:219–224. - PMC - PubMed
    1. Jamwal A., Agrawal R., Sharma M. Deep learning for manufacturing sustainability: models, applications in Industry 4.0 and implications. Int J Inf Manag Data Insights. 2022;2

LinkOut - more resources