Assessing the accuracy and clinical utility of GPT-4O in abnormal blood cell morphology recognition
- PMID: 39502485
- PMCID: PMC11536573
- DOI: 10.1177/20552076241298503
Assessing the accuracy and clinical utility of GPT-4O in abnormal blood cell morphology recognition
Abstract
Objectives: To evaluate the accuracy and clinical utility of GPT-4O in recognizing abnormal blood cell morphology, a critical component of hematologic diagnostics.
Methods: GPT-4O's blood cell morphology recognition capabilities were assessed by comparing its performance with hematologists. A total of 70 images from the Chinese National Center for Clinical Laboratories, External Quality Assessment (EQA) from 2022 to 2024 were analyzed. Two experienced hematology experts evaluated GPT-4O's recognition accuracy using a Likert scale.
Results: GPT-4O achieved an overall accuracy of 70% in blood cell morphology recognition, significantly lower than the 95.42% accuracy of hematologists (p < 0.05). For peripheral blood smears and bone marrow smears, GPT-4O's accuracy was 77.14% and 62.86% respectively. Likert scale evaluations revealed further discrepancies, with GPT-4O scoring 288.50 out of 350, compared to higher manual scores. GPT-4O accurately recognized certain intracellular inclusions such as Howell-Jolly bodies and Auer rods, while it misidentified fragmented red blood cells as neutrophilic metamyelocytes and oval-shaped red blood cells as sickle cells. Additionally, GPT-4O had difficulty accurately identifying intracellular granules and distinguishing cell nuclei and cytoplasm.
Conclusion: GPT-4O's performance in recognizing abnormal blood cell morphology is currently inadequate compared to hematologists. Despite its potential as a supplementary tool, significant improvements in its recognition algorithms and an expanded dataset are necessary for it to be reliable for clinical use. Future research should focus on enhancing GPT-4O's diagnostic accuracy and addressing its current limitations.
Keywords: ChatGPT; GPT-4O; blood cell morphology; clinical laboratory; morphology recognition.
© The Author(s) 2024.
Conflict of interest statement
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Figures
Similar articles
-
Suitability of GPT-4o as an evaluator of cardiopulmonary resuscitation skills examinations.Resuscitation. 2024 Nov;204:110404. doi: 10.1016/j.resuscitation.2024.110404. Epub 2024 Sep 28. Resuscitation. 2024. PMID: 39343124
-
ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430. JMIR Med Educ. 2024. PMID: 39504445 Free PMC article.
-
Patient Triage and Guidance in Emergency Departments Using Large Language Models: Multimetric Study.J Med Internet Res. 2025 May 15;27:e71613. doi: 10.2196/71613. J Med Internet Res. 2025. PMID: 40374171 Free PMC article.
-
Diagnostic Performance of GPT-4o and Claude 3 Opus in Determining Causes of Death From Medical Histories and Postmortem CT Findings.Cureus. 2024 Aug 20;16(8):e67306. doi: 10.7759/cureus.67306. eCollection 2024 Aug. Cureus. 2024. PMID: 39301343 Free PMC article.
-
Advancing medical AI: GPT-4 and GPT-4o surpass GPT-3.5 in Taiwanese medical licensing exams.PLoS One. 2025 Jun 4;20(6):e0324841. doi: 10.1371/journal.pone.0324841. eCollection 2025. PLoS One. 2025. PMID: 40465748 Free PMC article.
Cited by
-
Assessing the Accuracy, Completeness and Safety of ChatGPT-4o Responses on Pressure Injuries in Infants: Clinical Applications and Future Implications.Nurs Rep. 2025 Apr 14;15(4):130. doi: 10.3390/nursrep15040130. Nurs Rep. 2025. PMID: 40333050 Free PMC article.
-
Multimodal reasoning agent for enhanced ophthalmic decision-making: a preliminary real-world clinical validation.Front Cell Dev Biol. 2025 Jul 23;13:1642539. doi: 10.3389/fcell.2025.1642539. eCollection 2025. Front Cell Dev Biol. 2025. PMID: 40772224 Free PMC article.
References
-
- Imtiaz A, King J, Holmes S, et al. ChatGPT versus Bing: a clinician assessment of the accuracy of AI platforms when responding to COPD questions. Eur Respir J 2024; 63: 2400163. - PubMed
-
- Lechien JR, Naunheim MR, Maniaci A, et al. Performance and consistency of ChatGPT-4 versus otolaryngologists: a clinical case series. Otolaryngol Head Neck Surg 2024; 170: 1519–1526. - PubMed
LinkOut - more resources
Full Text Sources