Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 May 11;13(10):2832.
doi: 10.3390/jcm13102832.

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries

Affiliations

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries

Sophia M Pressman et al. J Clin Med. .

Abstract

Background: OpenAI's ChatGPT (San Francisco, CA, USA) and Google's Gemini (Mountain View, CA, USA) are two large language models that show promise in improving and expediting medical decision making in hand surgery. Evaluating the applications of these models within the field of hand surgery is warranted. This study aims to evaluate ChatGPT-4 and Gemini in classifying hand injuries and recommending treatment. Methods: Gemini and ChatGPT were given 68 fictionalized clinical vignettes of hand injuries twice. The models were asked to use a specific classification system and recommend surgical or nonsurgical treatment. Classifications were scored based on correctness. Results were analyzed using descriptive statistics, a paired two-tailed t-test, and sensitivity testing. Results: Gemini, correctly classifying 70.6% hand injuries, demonstrated superior classification ability over ChatGPT (mean score 1.46 vs. 0.87, p-value < 0.001). For management, ChatGPT demonstrated higher sensitivity in recommending surgical intervention compared to Gemini (98.0% vs. 88.8%), but lower specificity (68.4% vs. 94.7%). When compared to ChatGPT, Gemini demonstrated greater response replicability. Conclusions: Large language models like ChatGPT and Gemini show promise in assisting medical decision making, particularly in hand surgery, with Gemini generally outperforming ChatGPT. These findings emphasize the importance of considering the strengths and limitations of different models when integrating them into clinical practice.

Keywords: ChatGPT; Gemini; artificial intelligence (AI); deep learning; hand surgery; hand trauma; machine learning; management.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
An example of a prompt given to ChatGPT-4 (left) and Gemini (right) with the corresponding responses below. This prompt asked the models to classify a scaphoid fracture using Herbert and Fisher’s classification system.
Figure 2
Figure 2
An example of a prompt given to ChatGPT-4 (left) and Gemini (right) with the corresponding responses. This prompt asked the models to classify a mallet finger injury using Tubiana’s classification system.
Figure 3
Figure 3
Percentage of correct classifications for each classification system for ChatGPT-4 (left) and Gemini (right).
Figure 4
Figure 4
Applications of large language models (LLMs) in hand surgery.

References

    1. Miller R., Farnebo S., Horwitz M.D. Insights and trends review: Artificial intelligence in hand surgery. J. Hand Surg. Eur. Vol. 2023;48:396–403. doi: 10.1177/17531934231152592. - DOI - PubMed
    1. Topol E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. - DOI - PubMed
    1. Dave T., Athaluri S.A., Singh S. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front. Artif. Intell. 2023;6:1169595. doi: 10.3389/frai.2023.1169595. - DOI - PMC - PubMed
    1. Ulusoy I., Yılmaz M., Kıvrak A. How Efficient Is ChatGPT in Accessing Accurate and Quality Health-Related Information? Cureus. 2023;15:e46662. doi: 10.7759/cureus.46662. - DOI - PMC - PubMed
    1. Mikolov T., Karafiát M., Burget L., Cernocký J., Khudanpur S. Interspeech. ISCA; Chiba, Japan: 2010. Recurrent neural network based language model; pp. 1045–1048.

LinkOut - more resources