Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 12;3(8):100568.
doi: 10.1016/j.patter.2022.100568.

Measuring disparate outcomes of content recommendation algorithms with distributional inequality metrics

Affiliations

Measuring disparate outcomes of content recommendation algorithms with distributional inequality metrics

Tomo Lazovich et al. Patterns (N Y). .

Erratum in

Abstract

The harmful impacts of algorithmic decision systems have recently come into focus, with many examples of machine learning (ML) models amplifying societal biases. In this paper, we propose adapting income inequality metrics from economics to complement existing model-level fairness metrics, which focus on intergroup differences of model performance. In particular, we evaluate their ability to measure disparities between exposures that individuals receive in a production recommendation system, the Twitter algorithmic timeline. We define desirable criteria for metrics to be used in an operational setting by ML practitioners. We characterize engagements with content on Twitter using these metrics and use the results to evaluate the metrics with respect to our criteria. We also show that we can use these metrics to identify content suggestion algorithms that contribute more strongly to skewed outcomes between users. Overall, we conclude that these metrics can be a useful tool for auditing algorithms in production settings.

Keywords: AI ethics; attention inequality; inequality metrics; ranking and recommendation; responsible machine learning.

PubMed Disclaimer

Conflict of interest statement

Dr. Rumman Chowdhury is a member of the Patterns advisory board, and all authors are affiliated with Twitter.

Figures

Figure 1
Figure 1
Lorenz curves: Lorenz curves for different types of engagement on Twitter Lines closer to the dashed black line indicate more equal distributions of engagement. On the left, the more typical linear scale is shown. However, because the distributions are difficult to distinguish, a logarithmic scale is used for the y axis on the right. A small linear portion is included from 0 to 10−6 in order to visualize the point at which the distribution transitions from zero to non-zero values.
Figure 2
Figure 2
Metric stability: Relative error on metrics as a function of overall skew of the power law distribution (Gini coefficient) At high Gini values, the 80/20 ratio has the highest relative error (is the least stable), while at low Gini values the Gini is the least stable metric.
Figure 3
Figure 3
Metric effect size: Relative change in metric value as a function of how much wealth is transferred from the richest to the poorest individual
Figure 4
Figure 4
Inequality of engagements: Families of metrics, computed for the distribution of different engagement types (A) The Gini and Atkinson indices. (B) The 80/20 share and percentile ratios. Note here that these are shown only for impressions, as for all other types of engagements the share of the bottom 20% of users is zero. (C) The top 1% share and top 10% share for all engagement types. (D) The percentage of equivalence and the bottom percentage of users with share equal to the top 10% of users. Note that here, the percentages are inverted (100 − metric value rather than the value itself) because these metrics are very close to 100% and are more easily visualized on a logarithmic scale when inverted.
Figure 5
Figure 5
Inequality by suggestion type: Measured values of entropy-based and tail-share metrics by suggestion type for the distribution of impressions We again choose ε=0.5 for the Atkinson index, as in Figure 4. The distribution of impressions from the in-network ranking algorithm is the least skewed, while miscellaneous out-of-network suggestions are the most skewed.
Figure 6
Figure 6
Gini of impressions and followers: A comparison of the Gini coefficient of number of impressions and the statistics of number of followers Each point is one suggestion type, and the y axis shows the Gini index and average of number of followers for users who received impressions from that suggestion type.
Figure 7
Figure 7
Breakdown by popularity: Breakdown of the ranking and miscellaneous out-of-network suggestion types by number of followers (A) The average number of impressions increases similar to number of followers for both algorithms, with the overall number of in-network impressions being larger. (B) For authors with lower numbers of followers, the distribution of impressions from out-of-network sources is significantly more skewed than for in-network sources. In the highest bin, the Gini indices between the two sources are close to each other, showing that the distributions of impressions are very similar for both suggestion types once an author has enough followers.
Figure 8
Figure 8
Example Lorenz curve: An example Lorenz curve with annotations that can be used to derive several related metrics Capital letters (A and B) are areas, while the rest are lengths.
Figure 9
Figure 9
Follower multiplicity distribution: Distribution of number of followers for users in the dataset described in the results The bins here correspond to the bins used to define each point in Figure 7.

Comment in

  • Responsible and accountable data science.
    Wagner B, Müller-Birn C. Wagner B, et al. Patterns (N Y). 2022 Nov 11;3(11):100629. doi: 10.1016/j.patter.2022.100629. eCollection 2022 Nov 11. Patterns (N Y). 2022. PMID: 36419445 Free PMC article. No abstract available.

References

    1. McCurley K.S. 2008. Income Inequality in the Attention Economy.https://storage.googleapis.com/pub-tools-public-publication-data/pdf/333...
    1. Zhu L., Lerman K. Attention inequality in social media. arXiv. 2016 doi: 10.48550/arXiv.1601.07200. Preprint at. - DOI
    1. McClain C., Widjaya R., Rivero G., Smith A. The behaviors and attitudes of U.S. adults on Twitter. 2021. https://www.pewresearch.org/internet/2021/11/15/the-behaviors-and-attitu...
    1. Benjamin R. John Wiley and Sons; 2019. Race after Technology: Abolitionist Tools for the New Jim Code.
    1. Buolamwini J., Gebru T. Conference on fairness, accountability and transparency. PMLR; 2018. Gender shades: intersectional accuracy disparities in commercial gender classification; pp. 77–91.

LinkOut - more resources