Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr;56(4):3794-3813.
doi: 10.3758/s13428-024-02376-6. Epub 2024 May 9.

Taboo language across the globe: A multi-lab study

Affiliations

Taboo language across the globe: A multi-lab study

Simone Sulpizio et al. Behav Res Methods. 2024 Apr.

Abstract

The use of taboo words represents one of the most common and arguably universal linguistic behaviors, fulfilling a wide range of psychological and social functions. However, in the scientific literature, taboo language is poorly characterized, and how it is realized in different languages and populations remains largely unexplored. Here we provide a database of taboo words, collected from different linguistic communities (Study 1, N = 1046), along with their speaker-centered semantic characterization (Study 2, N = 455 for each of six rating dimensions), covering 13 languages and 17 countries from all five permanently inhabited continents. Our results show that, in all languages, taboo words are mainly characterized by extremely low valence and high arousal, and very low written frequency. However, a significant amount of cross-country variability in words' tabooness and offensiveness proves the importance of community-specific sociocultural knowledge in the study of taboo language.

Keywords: Best–worst scaling; Emotion; Semantics; Swearing; Taboo words.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Instructions for Study 1
Fig. 2
Fig. 2
Map of the countries involved in Study 1 and Study 2: the 18 labs and the respective 17 countries and 13 languages in which data were collected
Fig. 3
Fig. 3
Number of items produced in Study 1 for each lab. a Total number of items (darker colors) and the subset of items produced by at least 3% of all participants (lighter colors). b Average number of items produced per participant
Fig. 4
Fig. 4
Samples in which a word is among the 10 most frequently produced ones. The figure shows the samples in which a word (or a semantically closely related word) is among the 10 most frequently produced words in Study 1, alongside the number of samples for which the word has been produced (right) and number of these items produced per sample (top). This figure only includes words appearing in the top 10 in at least two samples. Note that the taboo words reported in the figure refer to sets of meaning-related words (not exact translations) that were created on the basis of our intuition and should only be considered as qualitative
Fig. 5
Fig. 5
Correlations between rating dimensions and frequency measures. Pairwise Pearson correlations between all rating dimensions and the production frequency in Study 1 (Study 1 freq.) and the written corpus frequency (corpus freq.). The upper triangle shows correlation values for single samples (each sample represented by a colored circle); the lower triangle represents the means of these correlations with their standard deviation in parentheses. Mean correlations significantly different from zero (p < .001 in a t-test) are marked with ***
Fig. 6
Fig. 6
Illustrations of the categorical regression analyses predicting taboo word status from semantic variables and written corpus frequency. a Distribution of valence (x-axis), arousal (y-axis), and written corpus frequency (point size) for taboo words and fillers (color-coded) in the combined dataset of all samples, alongside their classification accuracy (point type). Regression lines predicting arousal from valence are fitted with local polynomial regression (loess) fitting. b Accuracy rates (darker colors) and F1 scores (lighter colors) for the LOOCV analysis, predicting taboo word status in the left-out sample with a GLMM trained on all other samples (including as predictors valence, arousal, concreteness, AoA ratings, and written corpus frequency)
Fig. 7
Fig. 7
Differences and agreements between different varieties of English. a Left-hand side: correlations between the values on each rating dimension, Study 1 production frequency, and written frequency, for the different variants of the same language (English), computed on the shared items between these variants (AU: Australia; CA: Canada; GB: Great Britain; SG: Singapore; US: United States of America); the short horizontal lines indicate the correlation value between a pair of variants, the box next to the line indicates the pair of variants for which the correlation was computed. Right-hand side: the number of these shared items between pairs of variants. Note that the number of shared items is exactly the same for SG–US and SG–GB; thus, the latter is not visible in the plot. b Left-hand side of each plot: offensiveness (left plot)/tabooness (right plot) ratings by language variant for all the items appearing in at least four out of the five variants of English. Right-hand side of each plot: production frequency in Study 1 and written corpus frequency for these items (mean values)
Fig. 8
Fig. 8
Differences and agreements between different groups of Spanish speakers. a Left-hand side: correlations between the values on each rating dimension, Study 1 production frequency, and written frequency for the two different variants of Spanish, computed on the shared items between these variants. Right-hand side: the number of these shared items. b Left-hand side of each plot: offensiveness (left plot)/tabooness (right plot) ratings by language variant for all items that appear in both variants of Spanish. Right-hand side of each plot: production frequency in Study 1 (black bars) and written corpus frequency (grey bars) for these items (mean values)

References

    1. Allan K, Burridge K. Forbidden words: Taboo and the censoring of language. Cambridge University Press; 2006.
    1. Arnulf I, Uguccioni G, Gay F, Baldayrou E, Golmard JL, Gayraud F, Devevey A. What does the sleeping brain say? syntax and semantics of sleep talking in healthy subjects and in parasomnia patients. Sleep. 2017;40:zsx159. - PubMed
    1. Azzaro G. Taboo language in books, films, and the media. In: Allan K, editor. The Oxford handbook of taboo words and language. Oxford University Press; 2018. pp. 285–310.
    1. Baayen RH, Feldman LB, Schreuder R. Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory and Language. 2006;55:290–313.
    1. Balota DA, Chumbley JI. Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology: Human perception and performance. 1984;10:340–357. - PubMed

Publication types

LinkOut - more resources