Comparative Study

. 2025 Jul 1;25(1):264.

doi: 10.1186/s12883-025-04280-8.

Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines

Jiayi Deng^#^{1

2}, Xu Qiu^#^{1

2}, Chengqi Dong^{1

2}, Li Xu^{1

2}, Xiaoxue Dong^{3

4}, Shiyue Yang⁵, Qinghua Li^{1

2}, Tao Mei^{1

2}, Shi Chen^{1

2}, Yali Wu^{1

2}, Jianliang Sun^{1

2}, Feifang He⁶, Hanbin Wang^{7

8}, Liang Yu^{9

10}

Affiliations

¹ The Fourth Clinical School of Medicine, Zhejiang Chinese Medical University，Hangzhou First People's Hospital, Hangzhou, China.
² Department of Pain, The Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, China.
³ Department of Neurology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, No.86, Wujin Road, Shanghai, 200080, China.
⁴ Program in Developmental and Stem Cell Biology, The Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada.
⁵ Department of Anaesthesiology, First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China.
⁶ Department of Pain Management, Center for Intracranial Hypotension Management, Sir Run Run Shaw Hospital, Medical College of Zhejiang University, Hangzhou, China. hefeifang@zju.edu.cn.
⁷ The Fourth Clinical School of Medicine, Zhejiang Chinese Medical University，Hangzhou First People's Hospital, Hangzhou, China. wanghanbin@hospital.westlake.edu.cn.
⁸ Department of Pain, The Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, China. wanghanbin@hospital.westlake.edu.cn.
⁹ The Fourth Clinical School of Medicine, Zhejiang Chinese Medical University，Hangzhou First People's Hospital, Hangzhou, China. yuliang@hospital.westlake.edu.cn.
¹⁰ Department of Pain, The Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, China. yuliang@hospital.westlake.edu.cn.

^# Contributed equally.

PMID: 40597769
PMCID: PMC12211737
DOI: 10.1186/s12883-025-04280-8

Comparative Study

Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines

Jiayi Deng et al. BMC Neurol. 2025.

. 2025 Jul 1;25(1):264.

doi: 10.1186/s12883-025-04280-8.

Authors

Affiliations

¹ The Fourth Clinical School of Medicine, Zhejiang Chinese Medical University，Hangzhou First People's Hospital, Hangzhou, China.
² Department of Pain, The Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, China.
³ Department of Neurology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, No.86, Wujin Road, Shanghai, 200080, China.
⁴ Program in Developmental and Stem Cell Biology, The Hospital for Sick Children, 686 Bay Street, Toronto, ON, M5G 0A4, Canada.
⁵ Department of Anaesthesiology, First Affiliated Hospital of Soochow University, Suzhou, Jiangsu, China.
⁶ Department of Pain Management, Center for Intracranial Hypotension Management, Sir Run Run Shaw Hospital, Medical College of Zhejiang University, Hangzhou, China. hefeifang@zju.edu.cn.
⁷ The Fourth Clinical School of Medicine, Zhejiang Chinese Medical University，Hangzhou First People's Hospital, Hangzhou, China. wanghanbin@hospital.westlake.edu.cn.
⁸ Department of Pain, The Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, China. wanghanbin@hospital.westlake.edu.cn.
⁹ The Fourth Clinical School of Medicine, Zhejiang Chinese Medical University，Hangzhou First People's Hospital, Hangzhou, China. yuliang@hospital.westlake.edu.cn.
¹⁰ Department of Pain, The Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, Hangzhou, China. yuliang@hospital.westlake.edu.cn.

^# Contributed equally.

PMID: 40597769
PMCID: PMC12211737
DOI: 10.1186/s12883-025-04280-8

Abstract

Objective: To evaluate the use of ChatGPT and DeepSeek in clinical practice to provide healthcare professionals with accurate information on the prevention, diagnosis, and management of post-dural puncture headache (PDPH), in particular to evaluate ChatGPT-4o, ChatGPT-4o mini, DeepSeek-V3 and DeepSeek with Deep Think(R1)'s responses with consensus practice guidelines for headache after dural puncture.

Background: Post-dural puncture headache (PDPH) is a common complication of dural puncture. Currently, there is a lack of evidence-based guidance on the prevention, diagnosis and management of PDPH. The 2023 Consensus guidelines provide comprehensive information. With the development and popularization of AI, more and more people are using ai models, including patients and doctors. However, the quality of the answers provided by ai has not yet been tested.

Methods: Responses from ChatGPT-4o, ChatGPT-4o mini, DeepSeek-V3, and DeepSeek-R1 were evaluated against PDPH guidelines using four dimensions: Accuracy (guideline adherence), Overconclusiveness (unjustified recommendations), Supplementary information (additional relevant details), and Incompleteness (omission of critical guidelines). A 5-point Likert scale further assessed response accuracy and completeness.

Results: All four models show high accuracy and completeness.Of the 10 clinical guidelines evaluated,ChatGPT-4o, ChatGPT-4o mini, DeepSeek-V3 and DeepSeek-R1 all showed 100% accuracy in responses (10/10)(p = 1). None of the four models showed overly conclusive results(p = 1). In terms of supplementary information, ChatGPT-4o,ChatGPT-4o mini and DeepSeek-R1 are 100% (10/10), DeepSeek-V3 is 90% (9/10)(p = 1). In terms of incompleteness, ChatGPT-4o is 80%(8/10), DeepSeek-R1 is 70%(7/10), ChatGPT-4o mini and DeepSeek-V3 are 60% (6/10) (p = 0.729).

Conclusion: All four AI models demonstrate clinical validity, with ChatGPT-4o and DeepSeek-R1 showing stronger guideline alignment. Though largely accurate, their responses achieve only 60-80% completeness relative to medical guidelines. Healthcare professionals must exercise caution when using AI tools and should critically evaluate outputs before clinical application. While promising, their partial guideline coverage requires careful human oversight. Further validation research is essential before these models can reliably support clinical decision-making for complex conditions like PDPH.

Keywords: Artificial intelligence; ChatGPT; DeepSeek; Postdural puncture headache.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Accuracy, overconclusiveness, supplementary, and incompleteness of ChatGPT and DeepSeek recommendations compared to the guidelines

**Fig. 2**
The accuracy, completeness and reliability of ChatGPT and DeepSeek

See this image and copyright information in PMC

References

1. Stokel-Walker C, Van Noorden R. What ChatGPT and generative AI mean for science. Nature. 2023;614(7947):214–6. 10.1038/d41586-023-00340-6. - DOI - PubMed
1. Goh E, Gallo R, Hom J, Strong E, Weng Y, Kerman H, Cool JA, Kanjee Z, Parsons AS, Ahuja N, Horvitz E, Yang D, Milstein A, Olson A, Rodman A, Chen JH. Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial. JAMA Netw Open. 2024;7(10): e2440969. 10.1001/jamanetworkopen.2024.40969. - DOI - PMC - PubMed
1. Temsah A, Alhasan K, Altamimi I, Jamal A, Al-Eyadhy A, Malki KH, Temsah MH. DeepSeek in Healthcare: Revealing Opportunities and Steering Challenges of a New Open-Source Artificial Intelligence Frontier. Cureus. 2025;17(2): e79221. 10.7759/cureus.79221. - DOI - PMC - PubMed
1. Schyns-van den Berg A, Gupta A. Postdural puncture headache: Revisited. Best Pract Res Clin Anaesthesiol. 2023;37(2):171–87. 10.1016/j.bpa.2023.02.006. - DOI - PubMed
1. Kuczkowski KM. Post-dural puncture headache in the obstetric patient: an old problem New solutions. Minerva Anestesiol. 2004;70(12):823–30. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines

Affiliations

Evaluating ChatGPT and DeepSeek in postdural puncture headache management: a comparative study with international consensus guidelines

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources