AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries

Sophia M Pressman¹, Sahar Borna¹, Cesar A Gomez-Cabello¹, Syed Ali Haider¹, Antonio Jorge Forte^{1

2}

Affiliations

¹ Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA.
² Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA.

PMID: 38792374
PMCID: PMC11122623
DOI: 10.3390/jcm13102832

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries

Sophia M Pressman et al. J Clin Med. 2024.

. 2024 May 11;13(10):2832.

doi: 10.3390/jcm13102832.

Authors

Sophia M Pressman¹, Sahar Borna¹, Cesar A Gomez-Cabello¹, Syed Ali Haider¹, Antonio Jorge Forte^{1

2}

Affiliations

¹ Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA.
² Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA.

PMID: 38792374
PMCID: PMC11122623
DOI: 10.3390/jcm13102832

Abstract

Background: OpenAI's ChatGPT (San Francisco, CA, USA) and Google's Gemini (Mountain View, CA, USA) are two large language models that show promise in improving and expediting medical decision making in hand surgery. Evaluating the applications of these models within the field of hand surgery is warranted. This study aims to evaluate ChatGPT-4 and Gemini in classifying hand injuries and recommending treatment. Methods: Gemini and ChatGPT were given 68 fictionalized clinical vignettes of hand injuries twice. The models were asked to use a specific classification system and recommend surgical or nonsurgical treatment. Classifications were scored based on correctness. Results were analyzed using descriptive statistics, a paired two-tailed t-test, and sensitivity testing. Results: Gemini, correctly classifying 70.6% hand injuries, demonstrated superior classification ability over ChatGPT (mean score 1.46 vs. 0.87, p-value < 0.001). For management, ChatGPT demonstrated higher sensitivity in recommending surgical intervention compared to Gemini (98.0% vs. 88.8%), but lower specificity (68.4% vs. 94.7%). When compared to ChatGPT, Gemini demonstrated greater response replicability. Conclusions: Large language models like ChatGPT and Gemini show promise in assisting medical decision making, particularly in hand surgery, with Gemini generally outperforming ChatGPT. These findings emphasize the importance of considering the strengths and limitations of different models when integrating them into clinical practice.

Keywords: ChatGPT; Gemini; artificial intelligence (AI); deep learning; hand surgery; hand trauma; machine learning; management.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
An example of a prompt given to ChatGPT-4 (**left**) and Gemini (**right**) with the corresponding responses below. This prompt asked the models to classify a scaphoid fracture using Herbert and Fisher’s classification system.

**Figure 2**
An example of a prompt given to ChatGPT-4 (**left**) and Gemini (**right**) with the corresponding responses. This prompt asked the models to classify a mallet finger injury using Tubiana’s classification system.

**Figure 3**
Percentage of correct classifications for each classification system for ChatGPT-4 (**left**) and Gemini (**right**).

**Figure 4**
Applications of large language models (LLMs) in hand surgery.

See this image and copyright information in PMC

References

1. Miller R., Farnebo S., Horwitz M.D. Insights and trends review: Artificial intelligence in hand surgery. J. Hand Surg. Eur. Vol. 2023;48:396–403. doi: 10.1177/17531934231152592. - DOI - PubMed
1. Topol E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. - DOI - PubMed
1. Dave T., Athaluri S.A., Singh S. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front. Artif. Intell. 2023;6:1169595. doi: 10.3389/frai.2023.1169595. - DOI - PMC - PubMed
1. Ulusoy I., Yılmaz M., Kıvrak A. How Efficient Is ChatGPT in Accessing Accurate and Quality Health-Related Information? Cureus. 2023;15:e46662. doi: 10.7759/cureus.46662. - DOI - PMC - PubMed
1. Mikolov T., Karafiát M., Burget L., Cernocký J., Khudanpur S. Interspeech. ISCA; Chiba, Japan: 2010. Recurrent neural network based language model; pp. 1045–1048.

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries

Affiliations

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources