Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 18;12(22):2305.
doi: 10.3390/healthcare12222305.

Assessing the Performance of Chatbots on the Taiwan Psychiatry Licensing Examination Using the Rasch Model

Affiliations

Assessing the Performance of Chatbots on the Taiwan Psychiatry Licensing Examination Using the Rasch Model

Yu Chang et al. Healthcare (Basel). .

Abstract

Background/objectives: The potential and limitations of chatbots in medical education and clinical decision support, particularly in specialized fields like psychiatry, remain unknown. By using the Rasch model, our study aimed to evaluate the performance of various state-of-the-art chatbots on psychiatry licensing exam questions to explore their strengths and weaknesses.

Methods: We assessed the performance of 22 leading chatbots, selected based on LMArena benchmark rankings, using 100 multiple-choice questions from the 2024 Taiwan psychiatry licensing examination, a nationally standardized test required for psychiatric licensure in Taiwan. Chatbot responses were scored for correctness, and we used the Rasch model to evaluate chatbot ability.

Results: Chatbots released after February 2024 passed the exam, with ChatGPT-o1-preview achieving the highest score of 85. ChatGPT-o1-preview showed a statistically significant superiority in ability (p < 0.001), with a 1.92 logits improvement compared to the passing threshold. It demonstrated strengths in complex psychiatric problems and ethical understanding, yet it presented limitations in up-to-date legal updates and specialized psychiatry knowledge, such as recent amendments to the Mental Health Act, psychopharmacology, and advanced neuroimaging.

Conclusions: Chatbot technology could be a valuable tool for medical education and clinical decision support in psychiatry, and as technology continues to advance, these models are likely to play an increasingly integral role in psychiatric practice.

Keywords: Rasch model; chatbots; psychiatry licensing examination.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Person–item map (PKMAP) of ChatGPT-o1-preview. Vertical units in the map represent logits. The mark “XXX” indicates the chatbot’s ability level. Each item in the map corresponds to a question number from the examination, with a “1” or “0” placed after the item number. A “1” indicates that the question was answered correctly and is positioned on the left side of the map, while a “0” indicates that the question was answered incorrectly and is positioned on the right side. The difficulty of each item is also represented by its position along the vertical axis, showing how challenging the question was relative to the chatbot’s ability.

References

    1. Fitzpatrick K.K., Darcy A., Vierhile M. Delivering Cognitive Behavior Therapy to Young Adults with Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial. JMIR Ment. Health. 2017;4:e7785. doi: 10.2196/mental.7785. - DOI - PMC - PubMed
    1. Sallam M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare. 2023;11:887. doi: 10.3390/healthcare11060887. - DOI - PMC - PubMed
    1. Lee H. The Rise of ChatGPT: Exploring Its Potential in Medical Education. Anat. Sci. Educ. 2024;17:926–931. doi: 10.1002/ase.2270. - DOI - PubMed
    1. Cheng S.-W., Chang C.-W., Chang W.-J., Wang H.-W., Liang C.-S., Kishimoto T., Chang J.P.-C., Kuo J.S., Su K.-P. The Now and Future of ChatGPT and GPT in Psychiatry. Psychiatry Clin. Neurosci. 2023;77:592–596. doi: 10.1111/pcn.13588. - DOI - PMC - PubMed
    1. Wu S., Koo M., Blum L., Black A., Kao L., Scalzo F., Kurtz I. A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology. arXiv. 2023 doi: 10.48550/arXiv.2308.04709.2308.04709 - DOI

LinkOut - more resources