Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination

Panagiotis Giannos^{1

2

3}

Affiliations

¹ Department of Life Sciences, Imperial College London, London, UK.
² Society of Meta-Research and Biomedical Innovation, London, UK.
³ Promotion of Emerging and Evaluative Research Society, London, UK.

PMID: 37337531
PMCID: PMC10277081
DOI: 10.1136/bmjno-2023-000451

Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination

Panagiotis Giannos. BMJ Neurol Open. 2023.

. 2023 Jun 15;5(1):e000451.

doi: 10.1136/bmjno-2023-000451. eCollection 2023.

Author

Panagiotis Giannos^{1

2

3}

Affiliations

¹ Department of Life Sciences, Imperial College London, London, UK.
² Society of Meta-Research and Biomedical Innovation, London, UK.
³ Promotion of Emerging and Evaluative Research Society, London, UK.

PMID: 37337531
PMCID: PMC10277081
DOI: 10.1136/bmjno-2023-000451

Abstract

Background: Large language models such as ChatGPT have demonstrated potential as innovative tools for medical education and practice, with studies showing their ability to perform at or near the passing threshold in general medical examinations and standardised admission tests. However, no studies have assessed their performance in the UK medical education context, particularly at a specialty level, and specifically in the field of neurology and neuroscience.

Methods: We evaluated the performance of ChatGPT in higher specialty training for neurology and neuroscience using 69 questions from the Pool-Specialty Certificate Examination (SCE) Neurology Web Questions bank. The dataset primarily focused on neurology (80%). The questions spanned subtopics such as symptoms and signs, diagnosis, interpretation and management with some questions addressing specific patient populations. The performance of ChatGPT 3.5 Legacy, ChatGPT 3.5 Default and ChatGPT-4 models was evaluated and compared.

Results: ChatGPT 3.5 Legacy and ChatGPT 3.5 Default displayed overall accuracies of 42% and 57%, respectively, falling short of the passing threshold of 58% for the 2022 SCE neurology examination. ChatGPT-4, on the other hand, achieved the highest accuracy of 64%, surpassing the passing threshold and outperforming its predecessors across disciplines and subtopics.

Conclusions: The advancements in ChatGPT-4's performance compared with its predecessors demonstrate the potential for artificial intelligence (AI) models in specialised medical education and practice. However, our findings also highlight the need for ongoing development and collaboration between AI developers and medical experts to ensure the models' relevance and reliability in the rapidly evolving field of medicine.

Keywords: clinical neurology; health policy & practice; medicine.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None declared.

Figures

**Figure 1**
Comparative performance of ChatGPT-3 Legacy, ChatGPT-3 Default and ChatGPT-4 on SCE Neurology Questions. Accuracy and rate of each model presented as percentage of correct responses and score count (A). Co-occurrence of accurate responses across different disciplines and subtopics (B). Performance on relevant topics and subtopics in the field of neurology (C). Performance of each model in the remaining disciplines outside of neurology (D). SCE, specialty certificate examination.

See this image and copyright information in PMC

References

1. Khurana D, Koli A, Khatter K, et al. . Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl 2023;82:3713–44. 10.1007/s11042-022-13428-4 - DOI - PMC - PubMed
1. Kevin S. Microsoft teams up with OpenAi to exclusively license GPT-3 language model. n.d. Available: https://blogs.microsoft.com/blog/2020/09/22/microsoft-teams-up-with-open...
1. Nagarhalli TP, Vaze V, Rana NK. A review of current trends in the development of Chatbot systems. 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS); Coimbatore, India.2020:706–10 10.1109/ICACCS48705.2020.9074420 - DOI
1. Hutson M. Could AI help you to write your next paper Nature 2022;611:192–3. 10.1038/d41586-022-03479-w - DOI - PubMed
1. Stokel-Walker C. AI Bot ChatGPT writes smart essays - should professors worry? Nature 2022. 10.1038/d41586-022-04397-7 [Epub ahead of print 9 Dec 2022]. - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination

Affiliations

Evaluating the limits of AI in medical specialisation: ChatGPT's performance on the UK Neurology Specialty Certificate Examination

Author

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources