Evaluating the accuracy of generative artificial intelligence models in dental age estimation based on the Demirjian's method
- PMID: 40800006
- PMCID: PMC12339434
- DOI: 10.3389/fdmed.2025.1634006
Evaluating the accuracy of generative artificial intelligence models in dental age estimation based on the Demirjian's method
Abstract
Introduction: Dental age estimation plays a key role in forensic identification, clinical diagnosis, treatment planning, and prognosis in fields such as pediatric dentistry and orthodontics. Large language models (LLM) are increasingly being recognized for their potential applications in Dentistry. This study aimed to compare the performance of currently available generative artificial intelligence LLM technologies in estimating dental age using the Demirjian's scores.
Methods: Panoramic radiographs were analyzed using Demirjian's method (1973), with each left permanent mandibular tooth classified from stage A to H. Untrained LLM, ChatGPT (GPT-4-turbo), Gemini 2.0 Flash, and DeepSeek-V3 were tasked with estimating dental age based on the patient's Demirjian score for each tooth. Due to the probabilistic nature of ChatGPT, Gemini, and DeepSeek, which can produce varying responses to the same question, three responses were collected per case per day (three different computers) from each model on three separate days. The age estimates obtained from LLM were compared to the individuals' chronological ages. Intra- and inter-examiner reliability was assessed using the Intraclass Correlation Coefficient (ICC). Model performance was evaluated using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Coefficient of Determination (R 2), and Bias.
Results: Thirty panoramic radiographs (40% female, 60% male; mean age 10.4 ± 2.32 years) were included. Both intra- and inter-examiner ICC values exceeded 0.85. ChatGPT and DeepSeek exhibited comparable but suboptimal performance, with higher errors (MAE: 1.98-2.05 years; RMSE: 2.33-2.35 years), negative R 2 values (-0.069 to -0.049), and substantial overestimation biases (1.90-1.91 years), indicating poor model fit and systematic flaws. Gemini demonstrated intermediate results, with a moderate MAE (1.57 years) and RMSE (1.81 years), a positive R 2 (0.367), and a lower bias (1.32 years).
Discussion: This study demonstrated that, although LLM like ChatGPT, Gemini, and DeepSeek can estimate dental age using Demirjian's scores, their performance remains inferior to the traditional method. Among them, DeepSeek-V3 showed the best results, but all models require task-specific training and validation before clinical application.
Keywords: age determination by teeth; artificial intelligence; clinical decision-making; evidence-based dentistry; generative artificial intelligence; large language models.
© 2025 Abuabara, do Nascimento, Trentini, Costa Gonçalves, Hueb de Menezes-Oliveira, Madalena, Beisel-Memmert, Kirschneck, Antunes, Miranda de Araujo, Baratto-Filho and Küchler.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures


Similar articles
-
A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6. BMC Oral Health. 2025. PMID: 40721763 Free PMC article.
-
Prescription of Controlled Substances: Benefits and Risks.2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2025 Jul 6. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 30726003 Free Books & Documents.
-
Dental age estimation by comparing Demirjian's method and machine learning in Southeast Brazilian youth.Forensic Sci Med Pathol. 2025 Jul 11. doi: 10.1007/s12024-025-01042-3. Online ahead of print. Forensic Sci Med Pathol. 2025. PMID: 40643883
-
Artificial intelligence for diagnosing exudative age-related macular degeneration.Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2. Cochrane Database Syst Rev. 2024. PMID: 39417312
-
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100. Epidemiol Prev. 2013. PMID: 23851286 Italian.
References
-
- Demirjian A, Goldstein H, Tanner JM. A new system of dental age assessment. Hum Biol. (1973) 45(2):211–27. - PubMed
Associated data
LinkOut - more resources
Full Text Sources