Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons' Perspectives

Hazem Nasef¹, Heli Patel¹, Quratulain Amin¹, Samuel Baum², Asanthi Ratnasekera³, Darwin Ang⁴, William S Havron^{5

6}, Don Nakayama⁷, Adel Elkbuli^{5

6}

Affiliations

¹ NOVA Southeastern University, Kiran Patel College of Allopathic Medicine, Fort Lauderdale, FL, USA.
² Louisiana State University Health Science Center, College of Medicine, New Orleans, LA, USA.
³ Department of Surgery, Drexel College of Medicine, Newark, DE, USA.
⁴ Department of Surgery, Ocala Regional Medical Center, Ocala, FL, USA.
⁵ Department of Surgical Education, Orlando Regional Medical Center, Orlando, FL, USA.
⁶ Department of Surgery, Division of Trauma and Surgical Critical Care, Orlando Regional Medical Center, Orlando, FL, USA.
⁷ Mercer University School of Medicine, Columbus, GA, USA.

PMID: 38794965
DOI: 10.1177/00031348241256075

Comparative Study

Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons' Perspectives

Hazem Nasef et al. Am Surg. 2025 Mar.

. 2025 Mar;91(3):325-335.

doi: 10.1177/00031348241256075. Epub 2024 May 25.

Authors

Hazem Nasef¹, Heli Patel¹, Quratulain Amin¹, Samuel Baum², Asanthi Ratnasekera³, Darwin Ang⁴, William S Havron^{5

6}, Don Nakayama⁷, Adel Elkbuli^{5

6}

Affiliations

¹ NOVA Southeastern University, Kiran Patel College of Allopathic Medicine, Fort Lauderdale, FL, USA.
² Louisiana State University Health Science Center, College of Medicine, New Orleans, LA, USA.
³ Department of Surgery, Drexel College of Medicine, Newark, DE, USA.
⁴ Department of Surgery, Ocala Regional Medical Center, Ocala, FL, USA.
⁵ Department of Surgical Education, Orlando Regional Medical Center, Orlando, FL, USA.
⁶ Department of Surgery, Division of Trauma and Surgical Critical Care, Orlando Regional Medical Center, Orlando, FL, USA.
⁷ Mercer University School of Medicine, Columbus, GA, USA.

PMID: 38794965
DOI: 10.1177/00031348241256075

Expression of concern in

Expression of Concern.
[No authors listed] [No authors listed] Am Surg. 2025 Mar;91(3):464-472. doi: 10.1177/00031348241305412. Epub 2025 Jan 10. Am Surg. 2025. PMID: 39791244 Free PMC article. No abstract available.

Abstract

BackgroundThis study aims to assess the accuracy, comprehensiveness, and validity of ChatGPT compared to evidence-based sources regarding the diagnosis and management of common surgical conditions by surveying the perceptions of U.S. board-certified practicing surgeons.MethodsAn anonymous cross-sectional survey was distributed to U.S. practicing surgeons from June 2023 to March 2024. The survey comprised 94 multiple-choice questions evaluating diagnostic and management information for five common surgical conditions from evidence-based sources or generated by ChatGPT. Statistical analysis included descriptive statistics and paired-sample t-tests.ResultsParticipating surgeons were primarily aged 40-50 years (43%), male (86%), White (57%), and had 5-10 years or >15 years of experience (86%). The majority of surgeons had no prior experience with ChatGPT in surgical practice (86%). For material discussing both acute cholecystitis and upper gastrointestinal hemorrhage, evidence-based sources were rated as significantly more comprehensive (3.57 (±.535) vs 2.00 (±1.16), P = .025) (4.14 (±.69) vs 2.43 (±.98), P < .001) and valid (3.71 (±.488) vs 2.86 (±1.07), P = .045) (3.71 (±.76) vs 2.71 (±.95) P = .038) than ChatGPT. However, there was no significant difference in accuracy between the two sources (3.71 vs 3.29, P = .289) (3.57 vs 2.71, P = .111).ConclusionSurveyed U.S. board-certified practicing surgeons rated evidence-based sources as significantly more comprehensive and valid compared to ChatGPT across the majority of surveyed surgical conditions. However, there was no significant difference in accuracy between the sources across the majority of surveyed conditions. While ChatGPT may offer potential benefits in surgical practice, further refinement and validation are necessary to enhance its utility and acceptance among surgeons.

Keywords: ChatGPT; U.S surgeons; clinical practice; common surgical conditions; evidence-based medicine.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Atypon
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons' Perspectives

Affiliations

Evaluating the Accuracy, Comprehensiveness, and Validity of ChatGPT Compared to Evidence-Based Sources Regarding Common Surgical Conditions: Surgeons' Perspectives

Authors

Affiliations

Expression of concern in

Abstract

Conflict of interest statement

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Medical