Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 4;5(1):59.
doi: 10.1038/s43856-025-00781-2.

Physician clinical decision modification and bias assessment in a randomized controlled trial of AI assistance

Affiliations

Physician clinical decision modification and bias assessment in a randomized controlled trial of AI assistance

Ethan Goh et al. Commun Med (Lond). .

Abstract

Background: Artificial intelligence assistance in clinical decision making shows promise, but concerns exist about potential exacerbation of demographic biases in healthcare. This study aims to evaluate how physician clinical decisions and biases are influenced by AI assistance in a chest pain triage scenario.

Methods: A randomized, pre post-intervention study was conducted with 50 US-licensed physicians who reviewed standardized chest pain video vignettes featuring either a white male or Black female patient. Participants answered clinical questions about triage, risk assessment, and treatment before and after receiving GPT-4 generated recommendations. Clinical decision accuracy was evaluated against evidence-based guidelines.

Results: Here we show that physicians are willing to modify their clinical decisions based on GPT-4 assistance, leading to improved accuracy scores from 47% to 65% in the white male patient group and 63% to 80% in the Black female patient group. The accuracy improvement occurs without introducing or exacerbating demographic biases, with both groups showing similar magnitudes of improvement (18%). A post-study survey indicates that 90% of physicians expect AI tools to play a significant role in future clinical decision making.

Conclusions: Physician clinical decision making can be augmented by AI assistance while maintaining equitable care across patient demographics. These findings suggest a path forward for AI clinical decision support that improves medical care without amplifying healthcare disparities.

Plain language summary

Doctors sometimes make different medical decisions for patients based on their race or gender, even when the symptoms are the same and the advice should be similar. New artificial intelligence (AI) tools such as GPT-4 are becoming available to assist doctors when making clinical decisions. Our study looked at whether using AI would impact performance and bias during doctor decision making. We investigated how doctors respond to AI suggestions when evaluating chest pain, a common but serious medical concern. We showed 50 doctors a video of either a white male or Black female patient describing chest pain symptoms, and asked them to make medical decisions. The doctors then received suggestions from an AI system and could change their decisions. We found that doctors were willing to consider the AI’s suggestions and made more accurate medical decisions after receiving this help. This improvement in decision-making happened equally for all patients, regardless of their race or gender, suggesting AI tools could help improve medical care without increasing bias.

PubMed Disclaimer

Conflict of interest statement

Competing interests: B.B. discloses funding from the National Library of Medicine (grant No. 2T15LM007033). E.K. discloses funding from the National Heart, Lung, and Blood Institute (grant No. K23HL157750). R.J.G. is supported by a VA Advanced Fellowship in Medical Informatics. A.M. reports uncompensated and compensated relationships with care.coach, Emsana Health, Embold Health, ezPT, FN Advisors, Intermountain Healthcare, JRSL, The Leapfrog Group, the Peterson Center on Healthcare, Prealize Health, and PBGH. D.C. reports support from a Robert Wood Johnson Pioneer Grant. J.H.C. reports cofounding Reaction Explorer, which develops and licenses organic chemistry education software, as well as paid consulting fees from Sutton Pierce, Younker Hyde Macfarlane and Sykes McAllister as a medical expert witness. He receives funding from the National Institutes of Health (NIH)/National Institute of Allergy and Infectious Diseases (1R01AI17812101), NIH/National Institute on Drug Abuse Clinical Trials Network (UG1DA015815—CTN-0136), Stanford Artificial Intelligence in Medicine and Imaging—Human-Centered Artificial Intelligence Partnership Grant, the NIH-NCATS-Clinical & Translational Science Award (UM1TR004921), Stanford Bio-X Interdisciplinary Initiatives Seed Grants Program (IIP) [R12], NIH/Center for Undiagnosed Diseases at Stanford (U01 NS134358) and the American Heart Association—Strategically Focused Research Network—Diversity in Clinical Trials. The other authors declare no competing interests. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the paper.

Figures

Fig. 1
Fig. 1. Study Design.
Fifty US-licensed physicians were recruited for a remote video session where they were presented with a video of a standard patient actor depicting a case of chest pain in an outpatient setting. Participants were randomized to encounter an actor who was a white male or a Black female. The clinicians then responded to a series of four questions based on the vignette. For the first two questions, after providing their initial answers, they were presented with a pre-prepared LLM response based on the same vignette and questions. Clinicians were then offered an opportunity to modify their initial answers. For the final two questions, after their initial response, clinicians were allowed to directly interact with the LLM to ask any questions before considering whether or not to modify their answers.

References

    1. Davenport, T. & Kalakota, R. The potential for artificial intelligence in healthcare. Future Health. J.6, 94–98 (2019). - PMC - PubMed
    1. Strong, E. et al. Chatbot vs medical student performance on free-response clinical reasoning examinations. JAMA Intern Med. Published online July 17, 10.1001/jamainternmed.2023.2909 2023. - PMC - PubMed
    1. Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science366, 447–453 (2019). - PubMed
    1. Kim, J., Cai, Z. R., Chen, M. L., Simard, J. F. & Linos, E. Assessing biases in medical decisions via clinician and AI Chatbot responses to patient vignettes. JAMA Netw. Open6, e2338050 (2023). - PMC - PubMed
    1. Ito, N. et al. The accuracy and potential racial and ethnic biases of GPT-4 in the diagnosis and triage of health conditions: evaluation study. JMIR Med. Educ.9, e47532 (2023). - PMC - PubMed

LinkOut - more resources