Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 Apr 17;115(16):E3635-E3644.
doi: 10.1073/pnas.1720347115. Epub 2018 Apr 3.

Word embeddings quantify 100 years of gender and ethnic stereotypes

Affiliations
Comparative Study

Word embeddings quantify 100 years of gender and ethnic stereotypes

Nikhil Garg et al. Proc Natl Acad Sci U S A. .

Abstract

Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts-e.g., the women's movement in the 1960s and Asian immigration into the United States-and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.

Keywords: ethnic stereotypes; gender stereotypes; word embedding.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Women’s occupation relative percentage vs. embedding bias in Google News vectors. More positive indicates more associated with women on both axes. P<1010, r2=0.499. The shaded region is the 95% bootstrapped confidence interval of the regression line. In this single embedding, then, the association in the embedding effectively captures the percentage of women in an occupation.
Fig. 2.
Fig. 2.
Average gender bias score over time in COHA embeddings in occupations vs. the average percentage of difference. More positive means a stronger association with women. In blue is relative bias toward women in the embeddings, and in green is the average percentage of difference of women in the same occupations. Each shaded region is the bootstrap SE interval.
Fig. 3.
Fig. 3.
Average ethnic (Asian vs. White) bias score over time for occupations in COHA (blue) vs. the average percentage of difference (green). Each shaded region is the bootstrap SE interval.
Fig. 4.
Fig. 4.
Pearson correlation in embedding bias scores for adjectives over time between embeddings for each decade. The phase shift in the 1960s–1970s corresponds to the US women’s movement.
Fig. 5.
Fig. 5.
Pearson correlation in embedding Asian bias scores for adjectives over time between embeddings for each decade.
Fig. 6.
Fig. 6.
Asian bias score over time for words related to outsiders in COHA data. The shaded region is the bootstrap SE interval.
Fig. 7.
Fig. 7.
Religious (Islam vs. Christianity) bias score over time for words related to terrorism in New York Times data. Note that embeddings are trained in 3-y windows, so, for example, 2000 contains data from 1999–2001. The shaded region is the bootstrap SE interval.

References

    1. Hamilton DL, Trolier TK. Stereotypes and Stereotyping: An Overview of the Cognitive Approach in Prejudice, Discrimination, and Racism. Academic; San Diego: 1986. pp. 127–163.
    1. Basow SA. Gender: Stereotypes and Roles. 3rd Ed Thomson Brooks/Cole Publishing Co; Belmont, CA: 1992.
    1. Wetherell M, Potter J. Mapping the Language of Racism: Discourse and the Legitimation of Exploitation. Columbia Univ Press; New York: 1992.
    1. Holmes J, Meyerhoff M, editors. The Handbook of Language and Gender. Blackwell Publishing Ltd; Oxford: 2004.
    1. Coates J. Women, Men and Language: A Sociolinguistic Account of Gender Differences in Language. Routledge; London: 2016.

Publication types