Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 15:8:e1128.
doi: 10.7717/peerj-cs.1128. eCollection 2022.

Addressing religious hate online: from taxonomy creation to automated detection

Affiliations

Addressing religious hate online: from taxonomy creation to automated detection

Alan Ramponi et al. PeerJ Comput Sci. .

Abstract

Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on misogyny and racism. A current underexplored area in this context is religious hate, for which efforts in data and methods to date have been rather scattered. This is exacerbated by different annotation schemes that available datasets use, which inevitably lead to poor repurposing of data in wider contexts. Furthermore, religious hate is very much dependent on country-specific factors, including the presence and visibility of religious minorities, societal issues, historical background, and current political decisions. Motivated by the lack of annotated data specifically tailoring religion and the poor interoperability of current datasets, in this article we propose a fine-grained labeling scheme for religious hate speech detection. Such scheme lies on a wider and highly-interoperable taxonomy of abusive language, and covers the three main monotheistic religions: Judaism, Christianity and Islam. Moreover, we introduce a Twitter dataset in two languages-English and Italian-that has been annotated following the proposed annotation scheme. We experiment with several classification algorithms on the annotated dataset, from traditional machine learning classifiers to recent transformer-based language models, assessing the difficulty of two tasks: abusive language detection and religious hate speech detection. Finally, we investigate the cross-lingual transferability of multilingual models on the tasks, shedding light on the viability of repurposing our dataset for religious hate speech detection on low-resource languages. We release the annotated data and publicly distribute the code for our classification experiments at https://github.com/dhfbk/religious-hate-speech.

Keywords: Abusive language detection; Natural language processing; Religious hate speech detection.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Abusive language annotation taxonomy with a focus on religious hate.

Similar articles

Cited by

References

    1. Akiwowo S, Vidgen B, Prabhakaran V, Waseem Z, editors. Proceedings of the Fourth Workshop on Online Abuse and Harms; 2020.
    1. Albadi N, Kurdi M, Mishra S. Are they our brothers? Analysis and detection of religious hate speech in the Arabic Twittersphere. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM); Piscataway. 2018. pp. 69–76.
    1. Anzovino ME, Fersini E, Rosso P. Automatic identification and classification of misogynistic language on Twitter. International conference on applications of natural language to data bases.2018.
    1. Aroyo L, Welty C. Truth is a lie: crowd truth and the seven myths of human annotation. AI Magazine. 2015;36(1):15–24. doi: 10.1609/aimag.v36i1.2564. - DOI
    1. Awan I. Cyber-extremism: ISIS and the power of social media. Society. 2017;54:1–12. doi: 10.1007/s12115-016-0108-3. - DOI

LinkOut - more resources