Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 May 13;3(5):100476.
doi: 10.1016/j.patter.2022.100476.

Reliance on metrics is a fundamental challenge for AI

Affiliations
Review

Reliance on metrics is a fundamental challenge for AI

Rachel L Thomas et al. Patterns (N Y). .

Abstract

Through a series of case studies, we review how the unthinking pursuit of metric optimization can lead to real-world harms, including recommendation systems promoting radicalization, well-loved teachers fired by an algorithm, and essay grading software that rewards sophisticated garbage. The metrics used are often proxies for underlying, unmeasurable quantities (e.g., "watch time" of a video as a proxy for "user satisfaction"). We propose an evidence-based framework to mitigate such harms by (1) using a slate of metrics to get a fuller and more nuanced picture; (2) conducting external algorithmic audits; (3) combining metrics with qualitative accounts; and (4) involving a range of stakeholders, including those who will be most impacted.

Keywords: DSML 1: Concept: Basic principles of a new data science output observed and reported.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

References

    1. Likierman A. The five traps of performance measurement. Harv. Bus. Rev. 2009;87:96–101. - PubMed
    1. Kaplan R., Norton D. The balanced scorecard: measures that drive performance. Harv. Bus. Rev. 1992;70:71–79. - PubMed
    1. Ribeiro M.H., Ottoni R., West R., Almeida V.A.F., Meira W., Jr. Auditing radicalization pathways on YouTube. arXiv. 2019:1–18. Preprint at. abs/1908.08313.
    1. Turque B. ‘Creative… motivating’ and fired. Wash. Post. March 6, 2012. 2012
    1. Ramineni C., Williamson D. Understanding mean score differences between the e-rater® automated scoring engine and humans for demographically based groups in the GRE® general test. ETS Res. Rep. Ser. 2018;2018:1–31.

LinkOut - more resources