Comparative accuracy of artificial intelligence chatbots in pulpal and periradicular diagnosis: A cross-sectional study

João Daniel Mendonça de Moura¹, Carlos Eduardo Fontana², Vitor Henrique Reis da Silva Lima³, Iris de Souza Alves⁴, Paulo André de Melo Santos⁵, Patrícia de Almeida Rodrigues⁵

Affiliations

¹ Postgraduate Program in Clinical Dentistry, University Center of Pará (CESUPA), Belém, Pará, Brazil. Electronic address: joaodanielmoura@gmail.com.
² Center for Health Sciences, Pontifical Catholic University of Campinas (PUC-Campinas), Postgraduate Program in Health Sciences, Campinas, São Paulo, Brazil.
³ Endodontics Specialization Program, University Center of Pará (CESUPA), Belém, Pará, Brazil.
⁴ Dentistry Program, University Center of Pará (CESUPA), Belém, Pará, Brazil.
⁵ Postgraduate Program in Clinical Dentistry, University Center of Pará (CESUPA), Belém, Pará, Brazil.

PMID: 39471663
DOI: 10.1016/j.compbiomed.2024.109332

Comparative Study

Comparative accuracy of artificial intelligence chatbots in pulpal and periradicular diagnosis: A cross-sectional study

João Daniel Mendonça de Moura et al. Comput Biol Med. 2024 Dec.

. 2024 Dec:183:109332.

doi: 10.1016/j.compbiomed.2024.109332. Epub 2024 Oct 30.

Authors

João Daniel Mendonça de Moura¹, Carlos Eduardo Fontana², Vitor Henrique Reis da Silva Lima³, Iris de Souza Alves⁴, Paulo André de Melo Santos⁵, Patrícia de Almeida Rodrigues⁵

Affiliations

¹ Postgraduate Program in Clinical Dentistry, University Center of Pará (CESUPA), Belém, Pará, Brazil. Electronic address: joaodanielmoura@gmail.com.
² Center for Health Sciences, Pontifical Catholic University of Campinas (PUC-Campinas), Postgraduate Program in Health Sciences, Campinas, São Paulo, Brazil.
³ Endodontics Specialization Program, University Center of Pará (CESUPA), Belém, Pará, Brazil.
⁴ Dentistry Program, University Center of Pará (CESUPA), Belém, Pará, Brazil.
⁵ Postgraduate Program in Clinical Dentistry, University Center of Pará (CESUPA), Belém, Pará, Brazil.

PMID: 39471663
DOI: 10.1016/j.compbiomed.2024.109332

Abstract

Objectives: This study aimed to evaluate the diagnostic accuracy and treatment recommendation performance of four artificial intelligence chatbots in fictional pulpal and periradicular disease cases. Additionally, it investigated response consistency and the influence of text order and language on chatbot performance.

Methods: In this cross-sectional comparative study, eleven cases representing various pulpal and periradicular pathologies were created. These cases were presented to four chatbots (ChatGPT 3.5, ChatGPT 4.0, Bard, and Bing) in both Portuguese and English, with the information order varied (signs and symptoms first or imaging data first). Statistical analyses included the Kruskal-Wallis test, Dwass-Steel-Critchlow-Fligner pairwise comparisons, simple logistic regression, and the binomial test.

Results: Bing and ChatGPT 4.0 achieved the highest diagnostic accuracy rates (86.4 % and 85.3 % respectively), significantly outperforming ChatGPT 3.5 (46.5 %) and Bard (28.6 %) (p < 0.001). For treatment recommendations, ChatGPT 4.0, Bing, and ChatGPT 3.5 performed similarly (94.4 %, 93.2 %, and 86.3 %, respectively), while Bard exhibited significantly lower accuracy (75 %, p < 0.001). No significant association between diagnosis and treatment accuracy was found for Bard and Bing, but a positive association was observed for ChatGPT 3.5 and ChatGPT 4.0 (p < 0.05). The overall consistency rate was 98.29 %, with no significant differences related to text order or language. Cases presented in Portuguese prompted significantly more additional information requests than those in English (33.5 % vs. 10.2 %; p < 0.001), with the relevance of this information being higher in Portuguese (29.5 % vs. 8.5 %; p < 0.001).

Conclusions: Bing and ChatGPT 4.0 demonstrated superior diagnostic accuracy, while Bard showed the lowest accuracy in both diagnosis and treatment recommendations. However, the clinical application of these tools necessitates critical interpretation by dentists, as chatbot responses are not consistently reliable.

Keywords: Artificial intelligence; Dental pulp diseases; Diagnosis; Endodontics; Machine learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparative accuracy of artificial intelligence chatbots in pulpal and periradicular diagnosis: A cross-sectional study

Affiliations

Comparative accuracy of artificial intelligence chatbots in pulpal and periradicular diagnosis: A cross-sectional study

Authors

Affiliations

Abstract

Conflict of interest statement

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources