. 2025 Mar 19:187208251326795.

doi: 10.1177/00187208251326795. Online ahead of print.

Beyond Binary Decisions: Evaluating the Effects of AI Error Type on Trust and Performance in AI-Assisted Tasks

Jin Yong Kim¹, Corey Lester¹, X Jessie Yang¹

Affiliations

PMID: 40104968
PMCID: PMC12273520
DOI: 10.1177/00187208251326795

Beyond Binary Decisions: Evaluating the Effects of AI Error Type on Trust and Performance in AI-Assisted Tasks

Jin Yong Kim et al. Hum Factors. 2025.

. 2025 Mar 19:187208251326795.

doi: 10.1177/00187208251326795. Online ahead of print.

Authors

Jin Yong Kim¹, Corey Lester¹, X Jessie Yang¹

Affiliation

¹ University of Michigan, USA.

PMID: 40104968
PMCID: PMC12273520
DOI: 10.1177/00187208251326795

Abstract

ObjectiveWe investigated how various error patterns from an AI aid in the nonbinary decision scenario influence human operators' trust in the AI system and their task performance.BackgroundExisting research on trust in automation/autonomy predominantly uses the signal detection theory (SDT) to model autonomy performance. The SDT classifies the world into binary states and hence oversimplifies the interaction observed in real-world scenarios. Allowing multi-class classification of the world reveals intriguing error patterns previously unexplored in prior literature.MethodThirty-five participants completed 60 trials of a simulated mental rotation task assisted by an AI with 70-80% reliability. Participants' trust in and dependence on the AI system and their performance were measured. By combining participants' initial performance and the AI aid's performance, five distinct patterns emerged. Mixed-effects models were built to examine the effects of different patterns on trust adjustment, performance, and reaction time.ResultsVarying error patterns from AI impacted performance, reaction times, and trust. Some AI errors provided false reassurance, misleading operators into believing their incorrect decisions were correct, worsening performance and trust. Paradoxically, some AI errors prompted safety checks and verifications, which, despite causing a moderate decrease in trust, ultimately enhanced overall performance.ConclusionThe findings demonstrate that the types of errors made by an AI system significantly affect human trust and performance, emphasizing the need to model the complicated human-AI interaction in real life.ApplicationThese insights can guide the development of AI systems that classify the state of the world into multiple classes, enabling the operators to make more informed and accurate decisions based on feedback.

Keywords: human–AI interaction; human–automation interaction; human–autonomy interaction; multi-class classification; trust dynamics.

PubMed Disclaimer

Figures

**Figure 1.**
Difference in the performance patterns when choices are binary or non-binary. In the binary choice (a), the Wrong-Incorrect pattern always matches the reference. However, in the non-binary choice (b), the Wrong-Incorrect pattern could be categorized into two, based on whether incorrect predictions match the reference (prescribed pill).

**Figure 2.**
Signal Detection Theory (SDT)

**Figure 3.**
Illustration of stimuli development. After creating the (a) reference images, one (b) mirrored image, fourteen 45-degree (c) horizontal axis rotated figures, and fourteen 45-degree (d) vertical axis rotated figures were created for each reference image.

**Figure 4.**
For the beta test, participants were asked to determine whether the image pairs are the same items in different rotations (by clicking “Yes, they are the same item”) or different items (by clicking “No, they are different”) within 10 seconds.

**Figure 5.**
Flowchart of the experiment

**Figure 6.**
Participants were asked to make their initial answer choice within 15 seconds.

**Figure 7.**
Participants rated their confidence using a visual analog scale.

**Figure 8.**
Participants were presented with the AI system’s recognition and chose between sticking with or rejecting their initial choice within 10 seconds. If no selection was made, they were skipped to the next trial.

**Figure 9.**
Participants were presented with the performance feedback page, showing their performance on the initial and final answer choices and the validity of the AI’s prediction.

**Figure 10.**
Participants rated their trust and perceived reliability using a visual analog scale.

**Figure 11.**
Trust adjustment by patterns of performance. The error bars represent +/− 2 standard errors.

**Figure 12.**
Performance by patterns of performance. The error bars represent +/− 2 standard errors.

**Figure 13.**
Reaction Time by patterns of performance. The error bars represent +/− 2 standard errors.

**Figure 14.**
Autocorrelation of trust as a function of time separation. The error bars represent +/− 2 standard errors.

**Figure 15.**
Trust adjustment by patterns of performance. The error bars represent +/− 2 standard errors.

See this image and copyright information in PMC

Cited by

Comparative Analysis of Generative Artificial Intelligence Systems in Solving Clinical Pharmacy Problems: Mixed Methods Study.
Li L, Du P, Huang X, Zhao H, Ni M, Yan M, Wang A. Li L, et al. JMIR Med Inform. 2025 Jul 24;13:e76128. doi: 10.2196/76128. JMIR Med Inform. 2025. PMID: 40705654 Free PMC article.

References

1. Albayram Y, Jensen T, Khan MMH, Fahim MAA, Buck R, & Coman E. (2020). Investigating the effects of (empty) promises on human-automation interaction and trust repair. In Proceedings of the 8th International Conference on Human-Agent Interaction (pp. 6–14).
1. Aldhwaihi K, Schifano F, Pezzolesi C, & Umaru N. (2016). A systematic review of the nature of dispensing errors in hospital pharmacies. Integrated Pharmacy Research and Practice, 1–10. - PMC - PubMed
1. Ashktorab Z, Jain M, Liao QV, & Weisz JD (2019). Resilient chatbots: Repair strategy preferences for conversational breakdowns. In Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1–12).
1. Azevedo-Sa H, Jayaraman SK, Esterwood CT, Yang XJ, Robert LP, & Tilbury DM (2020). Comparing the effects of false alarms and misses on humans’ trust in (semi) autonomous vehicles. In Companion of the 2020 acm/ieee international conference on human-robot interaction (pp. 113–115).
1. Baker AL, Phillips EK, Ullman D, & Keebler JR (2018). Toward an understanding of trust repair in human-robot interaction: Current research and future directions. ACM Transactions on Interactive Intelligent Systems (TiiS), 8 (4), 1–30.

Grants and funding

R01 LM013624/LM/NLM NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Atypon
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Beyond Binary Decisions: Evaluating the Effects of AI Error Type on Trust and Performance in AI-Assisted Tasks

Affiliation

Beyond Binary Decisions: Evaluating the Effects of AI Error Type on Trust and Performance in AI-Assisted Tasks

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources