Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 28;25(1):648.
doi: 10.1186/s12903-025-06050-x.

Evaluation of the performance of large language models in clinical decision-making in endodontics

Affiliations

Evaluation of the performance of large language models in clinical decision-making in endodontics

Yağız Özbay et al. BMC Oral Health. .

Abstract

Background: Artificial intelligence (AI) chatbots are excellent at generating language. The growing use of generative AI large language models (LLMs) in healthcare and dentistry, including endodontics, raises questions about their accuracy. The potential of LLMs to assist clinicians' decision-making processes in endodontics is worth evaluating. This study aims to comparatively evaluate the answers provided by Google Bard, ChatGPT-3.5, and ChatGPT-4 to clinically relevant questions from the field of Endodontics.

Methods: 40 open-ended questions covering different areas of endodontics were prepared and were introduced to Google Bard, ChatGPT-3.5, and ChatGPT-4. Validity of the questions was evaluated using the Lawshe Content Validity Index. Two experienced endodontists, blinded to the chatbots, evaluated the answers using a 3-point Likert scale. All responses deemed to contain factually wrong information were noted and a misinformation rate for each LLM was calculated (number of answers containing wrong information/total number of questions). The One-way analysis of variance and Post Hoc Tukey test were used to analyze the data and significance was considered to be p < 0.05.

Results: ChatGPT-4 demonstrated the highest score and the lowest misinformation rate (P = 0.008) followed by ChatGPT-3.5 and Google Bard respectively. The difference between ChatGPT-4 and Google Bard was statistically significant (P = 0.004).

Conclusion: ChatGPT-4 provided more accurate and informative information in endodontics. However, all LLMs produced varying levels of incomplete or incorrect answers.

Keywords: Chat GPT; Chatbot; Endodontics; Endodontology; Large Language model.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Sample answer from Google Bard
Fig. 2
Fig. 2
Sample answer from ChatGPT-4
Fig. 3
Fig. 3
Sample answer from ChatGPT-3.5
Fig. 4
Fig. 4
Mean, standard deviation and misinformation rates of the Likert Scales cores. Different letters indicate statistically significant difference (P < 0.05)(Mean ± SD, n = 40). Misinformation rates of each language model were given inside columns with % symbol

References

    1. Schwendicke F, Samek W, Krois J. Artificial intelligence in dentistry: chances and challenges. J Dent Res. 2020;99(7):769–74. - PMC - PubMed
    1. Howard J. Artificial intelligence: implications for the future of work. Am J Ind Med. 2019;62(11):917–26. - PubMed
    1. Deng L. Artificial intelligence in the rising wave of deep learning: the historical path and future outlook [perspectives]. IEEE Signal Process Mag. 2018;35(1):180–177.
    1. Abd-Alrazaq A, AlSaad R, Alhuwail D, Ahmed A, Healy PM, Latifi S, Aziz S, Damseh R, Alabed Alrazak S, Sheikh J. Large Language models in medical education: opportunities, challenges, and future directions. JMIR Med Educ. 2023;9:e48291. - PMC - PubMed
    1. Revilla-Leon M, Gomez-Polo M, Vyas S, Barmak BA, Galluci GO, Att W, Krishnamurthy VR. Artificial intelligence applications in implant dentistry: A systematic review. J Prosthet Dent. 2023;129(2):293–300. - PubMed

LinkOut - more resources