Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec 28;14(1):31150.
doi: 10.1038/s41598-024-82501-9.

Explainable AI improves task performance in human-AI collaboration

Affiliations

Explainable AI improves task performance in human-AI collaboration

Julian Senoner et al. Sci Rep. .

Abstract

Artificial intelligence (AI) provides considerable opportunities to assist human work. However, one crucial challenge of human-AI collaboration is that many AI algorithms operate in a black-box manner where the way how the AI makes predictions remains opaque. This makes it difficult for humans to validate a prediction made by AI against their own domain knowledge. For this reason, we hypothesize that augmenting humans with explainable AI improves task performance in human-AI collaboration. To test this hypothesis, we implement explainable AI in the form of visual heatmaps in inspection tasks conducted by domain experts. Visual heatmaps have the advantage that they are easy to understand and help to localize relevant parts of an image. We then compare participants that were either supported by (a) black-box AI or (b) explainable AI, where the latter supports them to follow AI predictions when the AI is accurate or overrule the AI when the AI predictions are wrong. We conducted two preregistered experiments with representative, real-world visual inspection tasks from manufacturing and medicine. The first experiment was conducted with factory workers from an electronics factory, who performed [Formula: see text] assessments of whether electronic products have defects. The second experiment was conducted with radiologists, who performed [Formula: see text] assessments of chest X-ray images to identify lung lesions. The results of our experiments with domain experts performing real-world tasks show that task performance improves when participants are supported by explainable AI with heatmaps instead of black-box AI. We find that explainable AI as a decision aid improved the task performance by 7.7 percentage points (95% confidence interval [CI]: 3.3% to 12.0%, [Formula: see text]) in the manufacturing experiment and by 4.7 percentage points (95% CI: 1.1% to 8.3%, [Formula: see text]) in the medical experiment compared to black-box AI. These gains represent a significant improvement in task performance.

Keywords: Decision-making; Explainable AI; Human-centered AI; Human–AI collaboration; Task performance.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Figure 1
Figure 1
Overview of the experiments for assessing the effect of explainable AI on task performance. ( A ) Experimental design of the manufacturing experiment where factory workers were asked to “approve” images of faultless products and to “reject” images of defective products through a computer interface. ( B ) Experimental design of the medical experiment where radiologists were asked to decide whether lung lesions are visible in the chest X-ray image. In both experiments, participants were randomly assigned to one of the two treatments: (a) black-box AI or (b) explainable AI.
Figure 2
Figure 2
Results of manufacturing experiment. The boxplots compare the task performance between the two treatments: black-box AI and explainable AI. The task performance is measured by the balanced accuracy (A) and the defect detection rate (B) based on the quality assessment of workers and the ground-truth labels of the product images. A balanced accuracy of 50% provides a naïve baseline corresponding to a random guess (black dotted line). The standalone AI algorithm attains a balanced accuracy of 95.6% and a defect detection rate of 92.9% (orange dashed lines). Statistical significance is based on a one-sided Welch’s t-test (***formula image, **formula image, *formula image). In the boxplots, the center line denotes the median; box limits are upper and lower quartiles; whiskers are defined as the 1.5x interquartile range.
Figure 3
Figure 3
Results of medical experiment. The boxplots compare the task performance between the two treatments: black-box AI and explainable AI. The task performance is measured by the balanced accuracy (A) and the disease detection rate (B) based on the quality assessment of radiologists and the ground-truth labels of the chest X-ray images. A balanced accuracy of 50% provides a naïve baseline corresponding to a random guess (black dotted line). The standalone AI algorithm attains a balanced accuracy of 82.2% and a disease detection rate of 71.4% (orange dashed lines). Statistical significance is based on a one-sided Welch’s t-test (***formula image, **formula image, *formula image). In the boxplots, the center line denotes the median; box limits are upper and lower quartiles; whiskers are defined as the 1.5x interquartile range.

Similar articles

References

    1. Brynjolfsson, E. & Mitchell, T. What can machine learning do? Workforce implications. Science358, 1530–1534 (2017). - PubMed
    1. Perrault, R. & Clark, J. Artificial intelligence index report 2024. Human-centered artificial intelligence. United States of America. https://policycommons.net/artifacts/12089781/hai_ai-index-report-2024/12.... Accessed 26 Apr 2024. CID: 20.500.12592/h70s46h (2024).
    1. Bertolini, M., Mezzogori, D., Neroni, M. & Zammori, F. Machine learning for industrial applications: A comprehensive literature review. Expert Syst. Appl.175, 114820 (2021).
    1. Scheetz, J. et al. A survey of clinicians on the use of artificial intelligence in ophthalmology, dermatology, radiology and radiation oncology. Sci. Rep.11, 5193 (2021). - PMC - PubMed
    1. Cazzaniga, M., Jaumotte, F., Li, L., Melina, G., Panton, A. J., Pizzinelli, C., Rockall, E. J. & Tavares, M. M. Gen-AI: Artificial intelligence and the future of work. International Monetary Fund. Staff Discussion Notes 2024/001. https://www.imf.org/en/Publications/Staff-Discussion-Notes/Issues/2024/0... (2024).

LinkOut - more resources