. 2024 Apr;56(4):3794-3813.

doi: 10.3758/s13428-024-02376-6. Epub 2024 May 9.

Taboo language across the globe: A multi-lab study

Simone Sulpizio^#^{1

2}, Fritz Günther^#³, Linda Badan⁴, Benjamin Basclain⁵, Marc Brysbaert⁶, Yuen Lai Chan⁷, Laura Anna Ciaccio⁸, Carolin Dudschig⁹, Jon Andoni Duñabeitia¹⁰, Fabio Fasoli^{11

12}, Ludovic Ferrand¹³, Dušica Filipović Đurđević¹⁴, Ernesto Guerra¹⁵, Geoff Hollis¹⁶, Remo Job¹⁷, Khanitin Jornkokgoud¹⁸, Hasibe Kahraman⁵, Naledi Kgolo-Lotshwao¹⁹, Sachiko Kinoshita⁵, Julija Kos²⁰, Leslie Lee²¹, Nala H Lee²¹, Ian Grant Mackenzie⁹, Milica Manojlović¹⁴, Christina Manouilidou²⁰, Mirko Martinic¹⁵, Maria Del Carmen Méndez²², Ksenija Mišić¹⁴, Natinee Na Chiangmai¹⁸, Alexandre Nikolaev²³, Marina Oganyan²⁴, Patrice Rusconi²⁵, Giuseppe Samo²⁶, Chi-Shing Tse⁷, Chris Westbury²⁷, Peera Wongupparaj²⁸, Melvin J Yap²⁹, Marco Marelli^#^{30

31}

Affiliations

¹ Department of Psychology, University of Milano-Bicocca, Piazza dell'Ateneo Nuovo 1, 20126, Milan, Italy. simone.sulpizio@unimib.it.
² Milan Center for Neuroscience (NeuroMI), University of Milano-Bicocca, Milan, Italy. simone.sulpizio@unimib.it.
³ Department of Psychology, Humboldt-Universität zu Berlin, Unter den Linden 6, 10117, Berlin, Germany. fritz.guenther@hu-berlin.de.
⁴ Department of Humanities, University of Trento, Trento, Italy.
⁵ School of Psychological Sciences, Macquarie University, Sydney, Australia.
⁶ Department of Experimental Psychology, Ghent University, Ghent, Belgium.
⁷ Department of Educational Psychology, The Chinese University of Hong Kong, Hong Kong, China.
⁸ Brain Language Laboratory, Department of Philosophy and Humanities, Freie Universität Berlin, Berlin, Germany.
⁹ Department of Psychology, University of Tübingen, Tübingen, Germany.
¹⁰ Centro de Investigación Nebrija en Cognición (CINC), Universidad Nebrija, Madrid, Spain.
¹¹ School of Psychology, University of Surrey, Guildford, UK.
¹² Centro de Investigação e Intervenção Social, Instituto Universitário de Lisboa, Lisbon, Portugal.
¹³ Laboratoire de Psychologie Sociale et Cognitive, CNRS, Université Clermont Auvergne, Clermont-Ferrand, France.
¹⁴ Department of Psychology, University of Belgrade, Belgrade, Serbia.
¹⁵ Center for Advanced Research in Education, Institute of Education, Universidad de Chile, Santiago, Chile.
¹⁶ Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
¹⁷ Department of Psychology and Cognitive Science, University of Trento, Trento, Italy.
¹⁸ Cognitive Science and Innovation Research Unit (CSIRU), College of Research Methodology and Cognitive Science, Burapha University, Chonburi, Thailand.
¹⁹ Faculty of Humanities, University of Botswana, Gaborone, Botswana.
²⁰ Department of Comparative and General Linguistics, Faculty of Arts, University of Ljubljana, Ljubljana, Slovenia.
²¹ Department of English, Linguistics, & Theatre Studies, National University of Singapore, Singapore, Singapore.
²² Facultad de Filosofia y Letras I, Universidad de Alicante, Alicante, Spain.
²³ School of Humanities, Foreign Languages and Translation Studies, University of Eastern Finland, Joensuu, Finland.
²⁴ Department of Linguistics, University of Washington, Seattle, WA, USA.
²⁵ Department of Cognitive Sciences, Psychology, Education and Cultural Studies, University of Messina, Messina, Italy.
²⁶ Department of Linguistics, Beijing Language and Culture University, Beijing, China.
²⁷ Department of Psychology, University of Alberta, Edmonton, Canada.
²⁸ Department of Psychology, Faculty of Humanities and Social Sciences, Burapha University, Chonburi, Thailand.
²⁹ Department of Psychology, National University of Singapore, Singapore, Singapore.
³⁰ Department of Psychology, University of Milano-Bicocca, Piazza dell'Ateneo Nuovo 1, 20126, Milan, Italy. marco.marelli@unimib.it.
³¹ Milan Center for Neuroscience (NeuroMI), University of Milano-Bicocca, Milan, Italy. marco.marelli@unimib.it.

^# Contributed equally.

PMID: 38724878
PMCID: PMC11133054
DOI: 10.3758/s13428-024-02376-6

Taboo language across the globe: A multi-lab study

Simone Sulpizio et al. Behav Res Methods. 2024 Apr.

. 2024 Apr;56(4):3794-3813.

doi: 10.3758/s13428-024-02376-6. Epub 2024 May 9.

Authors

Affiliations

¹ Department of Psychology, University of Milano-Bicocca, Piazza dell'Ateneo Nuovo 1, 20126, Milan, Italy. simone.sulpizio@unimib.it.
² Milan Center for Neuroscience (NeuroMI), University of Milano-Bicocca, Milan, Italy. simone.sulpizio@unimib.it.
³ Department of Psychology, Humboldt-Universität zu Berlin, Unter den Linden 6, 10117, Berlin, Germany. fritz.guenther@hu-berlin.de.
⁴ Department of Humanities, University of Trento, Trento, Italy.
⁵ School of Psychological Sciences, Macquarie University, Sydney, Australia.
⁶ Department of Experimental Psychology, Ghent University, Ghent, Belgium.
⁷ Department of Educational Psychology, The Chinese University of Hong Kong, Hong Kong, China.
⁸ Brain Language Laboratory, Department of Philosophy and Humanities, Freie Universität Berlin, Berlin, Germany.
⁹ Department of Psychology, University of Tübingen, Tübingen, Germany.
¹⁰ Centro de Investigación Nebrija en Cognición (CINC), Universidad Nebrija, Madrid, Spain.
¹¹ School of Psychology, University of Surrey, Guildford, UK.
¹² Centro de Investigação e Intervenção Social, Instituto Universitário de Lisboa, Lisbon, Portugal.
¹³ Laboratoire de Psychologie Sociale et Cognitive, CNRS, Université Clermont Auvergne, Clermont-Ferrand, France.
¹⁴ Department of Psychology, University of Belgrade, Belgrade, Serbia.
¹⁵ Center for Advanced Research in Education, Institute of Education, Universidad de Chile, Santiago, Chile.
¹⁶ Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
¹⁷ Department of Psychology and Cognitive Science, University of Trento, Trento, Italy.
¹⁸ Cognitive Science and Innovation Research Unit (CSIRU), College of Research Methodology and Cognitive Science, Burapha University, Chonburi, Thailand.
¹⁹ Faculty of Humanities, University of Botswana, Gaborone, Botswana.
²⁰ Department of Comparative and General Linguistics, Faculty of Arts, University of Ljubljana, Ljubljana, Slovenia.
²¹ Department of English, Linguistics, & Theatre Studies, National University of Singapore, Singapore, Singapore.
²² Facultad de Filosofia y Letras I, Universidad de Alicante, Alicante, Spain.
²³ School of Humanities, Foreign Languages and Translation Studies, University of Eastern Finland, Joensuu, Finland.
²⁴ Department of Linguistics, University of Washington, Seattle, WA, USA.
²⁵ Department of Cognitive Sciences, Psychology, Education and Cultural Studies, University of Messina, Messina, Italy.
²⁶ Department of Linguistics, Beijing Language and Culture University, Beijing, China.
²⁷ Department of Psychology, University of Alberta, Edmonton, Canada.
²⁸ Department of Psychology, Faculty of Humanities and Social Sciences, Burapha University, Chonburi, Thailand.
²⁹ Department of Psychology, National University of Singapore, Singapore, Singapore.
³⁰ Department of Psychology, University of Milano-Bicocca, Piazza dell'Ateneo Nuovo 1, 20126, Milan, Italy. marco.marelli@unimib.it.
³¹ Milan Center for Neuroscience (NeuroMI), University of Milano-Bicocca, Milan, Italy. marco.marelli@unimib.it.

^# Contributed equally.

PMID: 38724878
PMCID: PMC11133054
DOI: 10.3758/s13428-024-02376-6

Abstract

The use of taboo words represents one of the most common and arguably universal linguistic behaviors, fulfilling a wide range of psychological and social functions. However, in the scientific literature, taboo language is poorly characterized, and how it is realized in different languages and populations remains largely unexplored. Here we provide a database of taboo words, collected from different linguistic communities (Study 1, N = 1046), along with their speaker-centered semantic characterization (Study 2, N = 455 for each of six rating dimensions), covering 13 languages and 17 countries from all five permanently inhabited continents. Our results show that, in all languages, taboo words are mainly characterized by extremely low valence and high arousal, and very low written frequency. However, a significant amount of cross-country variability in words' tabooness and offensiveness proves the importance of community-specific sociocultural knowledge in the study of taboo language.

Keywords: Best–worst scaling; Emotion; Semantics; Swearing; Taboo words.

PubMed Disclaimer

Figures

**Fig. 2**
Map of the countries involved in Study 1 and Study 2: the 18 labs and the respective 17 countries and 13 languages in which data were collected

**Fig. 3**
Number of items produced in Study 1 for each lab. a Total number of items (darker colors) and the subset of items produced by at least 3% of all participants (lighter colors). b Average number of items produced per participant

**Fig. 4**
Samples in which a word is among the 10 most frequently produced ones. The figure shows the samples in which a word (or a semantically closely related word) is among the 10 most frequently produced words in Study 1, alongside the number of samples for which the word has been produced (right) and number of these items produced per sample (top). This figure only includes words appearing in the top 10 in at least two samples. Note that the taboo words reported in the figure refer to sets of meaning-related words (not exact translations) that were created on the basis of our intuition and should only be considered as qualitative

**Fig. 5**
Correlations between rating dimensions and frequency measures. Pairwise Pearson correlations between all rating dimensions and the production frequency in Study 1 (*Study 1 freq.*) and the written corpus frequency (*corpus freq*.). The upper triangle shows correlation values for single samples (each sample represented by a colored circle); the lower triangle represents the means of these correlations with their standard deviation in parentheses. Mean correlations significantly different from zero (p < .001 in a t-test) are marked with ***

**Fig. 6**
Illustrations of the categorical regression analyses predicting taboo word status from semantic variables and written corpus frequency. a Distribution of valence (x-axis), arousal (y-axis), and written corpus frequency (point size) for taboo words and fillers (color-coded) in the combined dataset of all samples, alongside their classification accuracy (point type). Regression lines predicting arousal from valence are fitted with local polynomial regression (loess) fitting. b Accuracy rates (darker colors) and F1 scores (lighter colors) for the LOOCV analysis, predicting taboo word status in the left-out sample with a GLMM trained on all other samples (including as predictors valence, arousal, concreteness, AoA ratings, and written corpus frequency)

**Fig. 7**
Differences and agreements between different varieties of English. a Left-hand side: correlations between the values on each rating dimension, Study 1 production frequency, and written frequency, for the different variants of the same language (English), computed on the shared items between these variants (AU: Australia; CA: Canada; GB: Great Britain; SG: Singapore; US: United States of America); the short horizontal lines indicate the correlation value between a pair of variants, the box next to the line indicates the pair of variants for which the correlation was computed. Right-hand side: the number of these shared items between pairs of variants. Note that the number of shared items is exactly the same for SG–US and SG–GB; thus, the latter is not visible in the plot. b Left-hand side of each plot: offensiveness (left plot)/tabooness (right plot) ratings by language variant for all the items appearing in at least four out of the five variants of English. Right-hand side of each plot: production frequency in Study 1 and written corpus frequency for these items (mean values)

**Fig. 8**
Differences and agreements between different groups of Spanish speakers. a Left-hand side: correlations between the values on each rating dimension, Study 1 production frequency, and written frequency for the two different variants of Spanish, computed on the shared items between these variants. Right-hand side: the number of these shared items. b Left-hand side of each plot: offensiveness (left plot)/tabooness (right plot) ratings by language variant for all items that appear in both variants of Spanish. Right-hand side of each plot: production frequency in Study 1 (black bars) and written corpus frequency (grey bars) for these items (mean values)

See this image and copyright information in PMC

References

1. Allan K, Burridge K. Forbidden words: Taboo and the censoring of language. Cambridge University Press; 2006.
1. Arnulf I, Uguccioni G, Gay F, Baldayrou E, Golmard JL, Gayraud F, Devevey A. What does the sleeping brain say? syntax and semantics of sleep talking in healthy subjects and in parasomnia patients. Sleep. 2017;40:zsx159. - PubMed
1. Azzaro G. Taboo language in books, films, and the media. In: Allan K, editor. The Oxford handbook of taboo words and language. Oxford University Press; 2018. pp. 285–310.
1. Baayen RH, Feldman LB, Schreuder R. Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory and Language. 2006;55:290–313.
1. Balota DA, Chumbley JI. Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology: Human perception and performance. 1984;10:340–357. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Taboo language across the globe: A multi-lab study

Affiliations

Taboo language across the globe: A multi-lab study

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous