Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 19:187208251326795.
doi: 10.1177/00187208251326795. Online ahead of print.

Beyond Binary Decisions: Evaluating the Effects of AI Error Type on Trust and Performance in AI-Assisted Tasks

Affiliations

Beyond Binary Decisions: Evaluating the Effects of AI Error Type on Trust and Performance in AI-Assisted Tasks

Jin Yong Kim et al. Hum Factors. .

Abstract

ObjectiveWe investigated how various error patterns from an AI aid in the nonbinary decision scenario influence human operators' trust in the AI system and their task performance.BackgroundExisting research on trust in automation/autonomy predominantly uses the signal detection theory (SDT) to model autonomy performance. The SDT classifies the world into binary states and hence oversimplifies the interaction observed in real-world scenarios. Allowing multi-class classification of the world reveals intriguing error patterns previously unexplored in prior literature.MethodThirty-five participants completed 60 trials of a simulated mental rotation task assisted by an AI with 70-80% reliability. Participants' trust in and dependence on the AI system and their performance were measured. By combining participants' initial performance and the AI aid's performance, five distinct patterns emerged. Mixed-effects models were built to examine the effects of different patterns on trust adjustment, performance, and reaction time.ResultsVarying error patterns from AI impacted performance, reaction times, and trust. Some AI errors provided false reassurance, misleading operators into believing their incorrect decisions were correct, worsening performance and trust. Paradoxically, some AI errors prompted safety checks and verifications, which, despite causing a moderate decrease in trust, ultimately enhanced overall performance.ConclusionThe findings demonstrate that the types of errors made by an AI system significantly affect human trust and performance, emphasizing the need to model the complicated human-AI interaction in real life.ApplicationThese insights can guide the development of AI systems that classify the state of the world into multiple classes, enabling the operators to make more informed and accurate decisions based on feedback.

Keywords: human–AI interaction; human–automation interaction; human–autonomy interaction; multi-class classification; trust dynamics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Difference in the performance patterns when choices are binary or non-binary. In the binary choice (a), the Wrong-Incorrect pattern always matches the reference. However, in the non-binary choice (b), the Wrong-Incorrect pattern could be categorized into two, based on whether incorrect predictions match the reference (prescribed pill).
Figure 1.
Figure 1.
Difference in the performance patterns when choices are binary or non-binary. In the binary choice (a), the Wrong-Incorrect pattern always matches the reference. However, in the non-binary choice (b), the Wrong-Incorrect pattern could be categorized into two, based on whether incorrect predictions match the reference (prescribed pill).
Figure 2.
Figure 2.
Signal Detection Theory (SDT)
Figure 3.
Figure 3.
Illustration of stimuli development. After creating the (a) reference images, one (b) mirrored image, fourteen 45-degree (c) horizontal axis rotated figures, and fourteen 45-degree (d) vertical axis rotated figures were created for each reference image.
Figure 4.
Figure 4.
For the beta test, participants were asked to determine whether the image pairs are the same items in different rotations (by clicking “Yes, they are the same item”) or different items (by clicking “No, they are different”) within 10 seconds.
Figure 5.
Figure 5.
Flowchart of the experiment
Figure 6.
Figure 6.
Participants were asked to make their initial answer choice within 15 seconds.
Figure 6.
Figure 6.
Participants were asked to make their initial answer choice within 15 seconds.
Figure 7.
Figure 7.
Participants rated their confidence using a visual analog scale.
Figure 8.
Figure 8.
Participants were presented with the AI system’s recognition and chose between sticking with or rejecting their initial choice within 10 seconds. If no selection was made, they were skipped to the next trial.
Figure 8.
Figure 8.
Participants were presented with the AI system’s recognition and chose between sticking with or rejecting their initial choice within 10 seconds. If no selection was made, they were skipped to the next trial.
Figure 9.
Figure 9.
Participants were presented with the performance feedback page, showing their performance on the initial and final answer choices and the validity of the AI’s prediction.
Figure 9.
Figure 9.
Participants were presented with the performance feedback page, showing their performance on the initial and final answer choices and the validity of the AI’s prediction.
Figure 10.
Figure 10.
Participants rated their trust and perceived reliability using a visual analog scale.
Figure 11.
Figure 11.
Trust adjustment by patterns of performance. The error bars represent +/− 2 standard errors.
Figure 12.
Figure 12.
Performance by patterns of performance. The error bars represent +/− 2 standard errors.
Figure 13.
Figure 13.
Reaction Time by patterns of performance. The error bars represent +/− 2 standard errors.
Figure 13.
Figure 13.
Reaction Time by patterns of performance. The error bars represent +/− 2 standard errors.
Figure 14.
Figure 14.
Autocorrelation of trust as a function of time separation. The error bars represent +/− 2 standard errors.
Figure 15.
Figure 15.
Trust adjustment by patterns of performance. The error bars represent +/− 2 standard errors.

Similar articles

Cited by

References

    1. Albayram Y, Jensen T, Khan MMH, Fahim MAA, Buck R, & Coman E. (2020). Investigating the effects of (empty) promises on human-automation interaction and trust repair. In Proceedings of the 8th International Conference on Human-Agent Interaction (pp. 6–14).
    1. Aldhwaihi K, Schifano F, Pezzolesi C, & Umaru N. (2016). A systematic review of the nature of dispensing errors in hospital pharmacies. Integrated Pharmacy Research and Practice, 1–10. - PMC - PubMed
    1. Ashktorab Z, Jain M, Liao QV, & Weisz JD (2019). Resilient chatbots: Repair strategy preferences for conversational breakdowns. In Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1–12).
    1. Azevedo-Sa H, Jayaraman SK, Esterwood CT, Yang XJ, Robert LP, & Tilbury DM (2020). Comparing the effects of false alarms and misses on humans’ trust in (semi) autonomous vehicles. In Companion of the 2020 acm/ieee international conference on human-robot interaction (pp. 113–115).
    1. Baker AL, Phillips EK, Ullman D, & Keebler JR (2018). Toward an understanding of trust repair in human-robot interaction: Current research and future directions. ACM Transactions on Interactive Intelligent Systems (TiiS), 8 (4), 1–30.

LinkOut - more resources