Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 19:18:1387-1405.
doi: 10.2147/JPR.S509845. eCollection 2025.

Evaluating Large Language Models for Burning Mouth Syndrome Diagnosis

Affiliations

Evaluating Large Language Models for Burning Mouth Syndrome Diagnosis

Takayuki Suga et al. J Pain Res. .

Abstract

Introduction: Large language models have been proposed as diagnostic aids across various medical fields, including dentistry. Burning mouth syndrome, characterized by burning sensations in the oral cavity without identifiable cause, poses diagnostic challenges. This study explores the diagnostic accuracy of large language models in identifying burning mouth syndrome, hypothesizing potential limitations.

Materials and methods: Clinical vignettes of 100 synthesized burning mouth syndrome cases were evaluated using three large language models (ChatGPT-4o, Gemini Advanced 1.5 Pro, and Claude 3.5 Sonnet). Each vignette included patient demographics, symptoms, and medical history. Large language models were prompted to provide a primary diagnosis, differential diagnoses, and their reasoning. Accuracy was determined by comparing their responses with expert evaluations.

Results: ChatGPT and Claude achieved an accuracy rate of 99%, while Gemini's accuracy was 89% (p < 0.001). Misdiagnoses included Persistent Idiopathic Facial Pain and combined diagnoses with inappropriate conditions. Differences were also observed in reasoning patterns and additional data requests across the large language models.

Discussion: Despite high overall accuracy, the models exhibited variations in reasoning approaches and occasional errors, underscoring the importance of clinician oversight. Limitations include the synthesized nature of vignettes, potential over-reliance on exclusionary criteria, and challenges in differentiating overlapping disorders.

Conclusion: Large language models demonstrate strong potential as supplementary diagnostic tools for burning mouth syndrome, especially in settings lacking specialist expertise. However, their reliability depends on thorough patient assessment and expert verification. Integrating large language models into routine diagnostics could enhance early detection and management, ultimately improving clinical decision-making for dentists and specialists alike.

Keywords: artificial intelligence; burning mouth syndrome; dentistry; diagnostic accuracy; large language models.

PubMed Disclaimer

Conflict of interest statement

The authors declare that there are no competing interests.

Figures

Figure 1
Figure 1
An example of a prompt containing a vignette scenario.
Figure 2
Figure 2
An example of Large language model (LLM) responses to vignette scenario.
Figure 3
Figure 3
Error Analysis Flowchart: First, examine misdiagnosis cases with respect to logical consistency and sufficiency of information. Then, classify them into three categories: Logical Fallacy, Informational Fallacy, and Explicit Fallacy.

Similar articles

Cited by

References

    1. Lin X, Jin R, Huang W, Ye Y, Jin J, Zhu W. Trends of burning mouth syndrome: a bibliometric study. Front Neurol. 2024;15:1443817. doi:10.3389/fneur.2024.1443817 - DOI - PMC - PubMed
    1. Headache Classification Committee of the International Headache Society (IHS) The International Classification of Headache Disorders, 3rd edition. Cephalalgia. 2018;38(1):1–211. doi:10.1177/0333102417738202 - DOI - PubMed
    1. Bogetto F, Maina G, Ferro G, Carbone M, Gandolfo S. Psychiatric comorbidity in patients with burning mouth syndrome. Psychosomatic Med. 1998;60(3):378–385. doi:10.1097/00006842-199805000-00028 - DOI - PubMed
    1. Toyofuku A, Matsuoka H, Abiko Y. Reappraising the psychosomatic approach in the study of “chronic orofacial pain”: looking for the essential nature of these intractable conditions. Front Pain Res. 2024;5:1349847. doi:10.3389/fpain.2024.1349847 - DOI - PMC - PubMed
    1. Toyofuku A. Psychosomatic problems in dentistry. Biopsychosoc Med. 2016;10:14. doi:10.1186/s13030-016-0068-2 - DOI - PMC - PubMed

LinkOut - more resources