Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2025 Jun 5;25(1):845.
doi: 10.1186/s12909-025-07414-1.

Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study

Affiliations
Comparative Study

Iteratively refined ChatGPT outperforms clinical mentors in generating high-quality interprofessional education clinical scenarios: a comparative study

Tian Qingquan et al. BMC Med Educ. .

Abstract

Background: Interprofessional education (IPE) is essential for promoting teamwork among healthcare professionals. However, its implementation is often hindered by the limited availability of interprofessional faculty and scheduling challenges in creating high-quality IPE scenarios. While AI tools like ChatGPT are increasingly being explored for this purpose, they have yet to demonstrate the ability to generate high-quality IPE scenarios, which remains a significant challenge. This study examines the effectiveness of GPT-4o, an advanced version of ChatGPT enhanced by novel methodologies, in overcoming these obstacles.

Methods: This comparative study assessed clinical scenarios generated by GPT-4o using two strategies-standard prompt (a single-step scenario generation without iterative feedback) and iterative refinement (a multi-step, feedback-driven process)-against those crafted by clinical mentors. The iterative refinement method, inspired by actual clinical scenario development, employs a cyclical process of evaluation and refinement, closely mimicking discussions among professionals. Scenarios were evaluated for time efficiency and quality using the Interprofessional Quality Score (IQS), defined as the mean score assigned by multidisciplinary evaluators across five interprofessional criteria: clinical authenticity, team collaboration, educational alignment, appropriate challenge, and student engagement.

Results: Scenarios developed using the iterative refinement strategy were completed significantly faster than those by clinical mentors and achieved higher or equivalent IQS. Notably, these scenarios matched or exceeded the quality of those created by humans, particularly in areas such as appropriate challenge and student engagement. Conversely, scenarios generated via the standard prompt method exhibited lower accuracy and various other deficiencies. Blinded attribution assessments by students further demonstrated that scenarios developed through iterative refinement were often indistinguishable from those created by human mentors.

Conclusions: Employing GPT-4o with iterative refinement and role-playing strategies produces clinical scenarios that, in some areas, exceed those developed by clinical mentors. This approach reduces the need for extensive faculty involvement, highlighting AI's potential to closely align with established educational frameworks and substantially enhance IPE, particularly in resource-constrained settings.

Keywords: Artificial intelligence; ChatGPT; Clinical scenarios; Interprofessional education; Iterative refinement.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Schematic representation of the study design
Fig. 2
Fig. 2
Time and quality comparison results of clinical scenarios generated by GPT-4o under different strategies versus human creation. a) Average time spent by different groups to create clinical scenarios, with the time for the clinical mentors group representing the cumulative effort of all involved multidisciplinary mentors. b) Comparison of IQS results across various metrics for clinical scenarios created by different methods. Statistical significance was determined using two-tailed T-tests, with * indicating p <.05, ** indicating p <.01 and *** indicating p <.001

Similar articles

References

    1. Gilbert JH, Yan J, Hoffman SJ. A WHO report: framework for action on interprofessional education and collaborative practice. J Allied Health. 2010;39(Suppl 1):196–7. - PubMed
    1. Reeves S, Tassone M, Parker K, Wagner SJ, Simmons B. Interprofessional education: an overview of key developments in the past three decades. Work. 2012;41(3):233–45. - PubMed
    1. Sunguya BF, Hinthong W, Jimba M, Yasuoka J. Interprofessional education for whom? --challenges and lessons learned from its implementation in developed countries and their application to developing countries: a systematic review. PLoS ONE. 2014;9(5):e96724. - PMC - PubMed
    1. Dallaghan GL, Hoffman E, Lyden E, Bevil C. Faculty attitudes about interprofessional education. Med Educ Online. 2016;21:32065. - PMC - PubMed
    1. Showstark M, Joosten-Hagye D, Wiss AC. Virtual interprofessional education (VIPE): a Multi-institutional innovation. Med Sci Educ. 2022;32(1):7–8. - PMC - PubMed

Publication types

LinkOut - more resources