Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing

doi:10.1093/ejo/cjae017

. 2024 Apr 13:cjae017.

doi: 10.1093/ejo/cjae017. Online ahead of print.

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing

Miltiadis A Makrygiannakis^{1

2}, Kostis Giannakopoulos², Eleftherios G Kaklamanos^{2

3

4}

Affiliations

¹ School of Dentistry, National and Kapodistrian University of Athens, Athens 11527, Greece.
² School of Dentistry, European University Cyprus, Nicosia 2404, Cyprus.
³ School of Dentistry, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece.
⁴ Hamdan bin Mohammed College of Dental Medicine, Mohammed bin Rashid University of Medicine and Health Sciences (MBRU), Dubai 505055, United Arab Emirates.

PMID: 38613510
DOI: 10.1093/ejo/cjae017

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing

Miltiadis A Makrygiannakis et al. Eur J Orthod. 2024.

. 2024 Apr 13:cjae017.

doi: 10.1093/ejo/cjae017. Online ahead of print.

Authors

Miltiadis A Makrygiannakis^{1

2}, Kostis Giannakopoulos², Eleftherios G Kaklamanos^{2

3

4}

Affiliations

¹ School of Dentistry, National and Kapodistrian University of Athens, Athens 11527, Greece.
² School of Dentistry, European University Cyprus, Nicosia 2404, Cyprus.
³ School of Dentistry, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece.
⁴ Hamdan bin Mohammed College of Dental Medicine, Mohammed bin Rashid University of Medicine and Health Sciences (MBRU), Dubai 505055, United Arab Emirates.

PMID: 38613510
DOI: 10.1093/ejo/cjae017

Abstract

Background: The increasing utilization of large language models (LLMs) in Generative Artificial Intelligence across various medical and dental fields, and specifically orthodontics, raises questions about their accuracy.

Objective: This study aimed to assess and compare the answers offered by four LLMs: Google's Bard, OpenAI's ChatGPT-3.5, and ChatGPT-4, and Microsoft's Bing, in response to clinically relevant questions within the field of orthodontics.

Materials and methods: Ten open-type clinical orthodontics-related questions were posed to the LLMs. The responses provided by the LLMs were assessed on a scale ranging from 0 (minimum) to 10 (maximum) points, benchmarked against robust scientific evidence, including consensus statements and systematic reviews, using a predefined rubric. After a 4-week interval from the initial evaluation, the answers were reevaluated to gauge intra-evaluator reliability. Statistical comparisons were conducted on the scores using Friedman's and Wilcoxon's tests to identify the model providing the answers with the most comprehensiveness, scientific accuracy, clarity, and relevance.

Results: Overall, no statistically significant differences between the scores given by the two evaluators, on both scoring occasions, were detected, so an average score for every LLM was computed. The LLM answers scoring the highest, were those of Microsoft Bing Chat (average score = 7.1), followed by ChatGPT 4 (average score = 4.7), Google Bard (average score = 4.6), and finally ChatGPT 3.5 (average score 3.8). While Microsoft Bing Chat statistically outperformed ChatGPT-3.5 (P-value = 0.017) and Google Bard (P-value = 0.029), as well, and Chat GPT-4 outperformed Chat GPT-3.5 (P-value = 0.011), all models occasionally produced answers with a lack of comprehensiveness, scientific accuracy, clarity, and relevance.

Limitations: The questions asked were indicative and did not cover the entire field of orthodontics.

Conclusions: Language models (LLMs) show great potential in supporting evidence-based orthodontics. However, their current limitations pose a potential risk of making incorrect healthcare decisions if utilized without careful consideration. Consequently, these tools cannot serve as a substitute for the orthodontist's essential critical thinking and comprehensive subject knowledge. For effective integration into practice, further research, clinical validation, and enhancements to the models are essential. Clinicians must be mindful of the limitations of LLMs, as their imprudent utilization could have adverse effects on patient care.

Keywords: ChatGPT; Google bard; Microsoft bing chat; large language models; orthodontics.

PubMed Disclaimer

Cited by

Evaluating the evidence-based potential of six large language models in paediatric dentistry: a comparative study on generative artificial intelligence.
Dermata A, Arhakis A, Makrygiannakis MA, Giannakopoulos K, Kaklamanos EG. Dermata A, et al. Eur Arch Paediatr Dent. 2025 Jun;26(3):527-535. doi: 10.1007/s40368-025-01012-x. Epub 2025 Feb 22. Eur Arch Paediatr Dent. 2025. PMID: 39987420 Free PMC article.
PICOT questions and search strategies formulation: A novel approach using artificial intelligence automation.
Gosak L, Štiglic G, Pruinelli L, Vrbnjak D. Gosak L, et al. J Nurs Scholarsh. 2025 Jan;57(1):5-16. doi: 10.1111/jnu.13036. Epub 2024 Nov 24. J Nurs Scholarsh. 2025. PMID: 39582233 Free PMC article.
Comparing orthodontic pre-treatment information provided by large language models.
Chen J, Ge X, Yuan C, Chen Y, Li X, Zhang X, Chen S, Zheng W, Miao C. Chen J, et al. BMC Oral Health. 2025 May 28;25(1):838. doi: 10.1186/s12903-025-06246-1. BMC Oral Health. 2025. PMID: 40437500 Free PMC article.
Comparative Performance of Chatbots in Endodontic Clinical Decision Support: A 4-Day Accuracy and Consistency Study.
Büker M, Sümbüllü M, Arslan H. Büker M, et al. Int Dent J. 2025 Jul 27;75(5):100920. doi: 10.1016/j.identj.2025.100920. Online ahead of print. Int Dent J. 2025. PMID: 40720933 Free PMC article.
Evaluating the influence of prompt formulation on the reliability and repeatability of ChatGPT in implant-supported prostheses.
Freire Y, Santamaría Laorden A, Orejas Pérez J, Ortiz Collado I, Gómez Sánchez M, Thuissard Vasallo IJ, Díaz-Flores García V, Suárez A. Freire Y, et al. PLoS One. 2025 May 30;20(5):e0323086. doi: 10.1371/journal.pone.0323086. eCollection 2025. PLoS One. 2025. PMID: 40445924 Free PMC article.

See all "Cited by" articles

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing

Affiliations

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing

Authors

Affiliations

Abstract

Similar articles

Cited by

LinkOut - more resources

Full Text Sources