Addressing religious hate online: from taxonomy creation to automated detection
- PMID: 37346317
- PMCID: PMC10280248
- DOI: 10.7717/peerj-cs.1128
Addressing religious hate online: from taxonomy creation to automated detection
Abstract
Abusive language in online social media is a pervasive and harmful phenomenon which calls for automatic computational approaches to be successfully contained. Previous studies have introduced corpora and natural language processing approaches for specific kinds of online abuse, mainly focusing on misogyny and racism. A current underexplored area in this context is religious hate, for which efforts in data and methods to date have been rather scattered. This is exacerbated by different annotation schemes that available datasets use, which inevitably lead to poor repurposing of data in wider contexts. Furthermore, religious hate is very much dependent on country-specific factors, including the presence and visibility of religious minorities, societal issues, historical background, and current political decisions. Motivated by the lack of annotated data specifically tailoring religion and the poor interoperability of current datasets, in this article we propose a fine-grained labeling scheme for religious hate speech detection. Such scheme lies on a wider and highly-interoperable taxonomy of abusive language, and covers the three main monotheistic religions: Judaism, Christianity and Islam. Moreover, we introduce a Twitter dataset in two languages-English and Italian-that has been annotated following the proposed annotation scheme. We experiment with several classification algorithms on the annotated dataset, from traditional machine learning classifiers to recent transformer-based language models, assessing the difficulty of two tasks: abusive language detection and religious hate speech detection. Finally, we investigate the cross-lingual transferability of multilingual models on the tasks, shedding light on the viability of repurposing our dataset for religious hate speech detection on low-resource languages. We release the annotated data and publicly distribute the code for our classification experiments at https://github.com/dhfbk/religious-hate-speech.
Keywords: Abusive language detection; Natural language processing; Religious hate speech detection.
©2022 Ramponi et al.
Conflict of interest statement
The authors declare there are no competing interests.
Figures
Similar articles
-
Hate speech detection and racial bias mitigation in social media based on BERT model.PLoS One. 2020 Aug 27;15(8):e0237861. doi: 10.1371/journal.pone.0237861. eCollection 2020. PLoS One. 2020. PMID: 32853205 Free PMC article.
-
Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization.Sci Rep. 2024 Nov 19;14(1):28590. doi: 10.1038/s41598-024-79106-7. Sci Rep. 2024. PMID: 39562608 Free PMC article.
-
Emotionally Informed Hate Speech Detection: A Multi-target Perspective.Cognit Comput. 2022;14(1):322-352. doi: 10.1007/s12559-021-09862-5. Epub 2021 Jun 28. Cognit Comput. 2022. PMID: 34221180 Free PMC article.
-
Hate speech and abusive language detection in Indonesian social media: Progress and challenges.Heliyon. 2023 Jul 28;9(8):e18647. doi: 10.1016/j.heliyon.2023.e18647. eCollection 2023 Aug. Heliyon. 2023. PMID: 37636475 Free PMC article. Review.
-
Directions in abusive language training data, a systematic review: Garbage in, garbage out.PLoS One. 2020 Dec 28;15(12):e0243300. doi: 10.1371/journal.pone.0243300. eCollection 2020. PLoS One. 2020. PMID: 33370298 Free PMC article.
Cited by
-
Text data augmentation and pre-trained Language Model for enhancing text classification of low-resource languages.PeerJ Comput Sci. 2024 Mar 29;10:e1974. doi: 10.7717/peerj-cs.1974. eCollection 2024. PeerJ Comput Sci. 2024. PMID: 38660166 Free PMC article.
-
Special issue on analysis and mining of social media data.PeerJ Comput Sci. 2024 Feb 29;10:e1909. doi: 10.7717/peerj-cs.1909. eCollection 2024. PeerJ Comput Sci. 2024. PMID: 38435569 Free PMC article.
References
-
- Akiwowo S, Vidgen B, Prabhakaran V, Waseem Z, editors. Proceedings of the Fourth Workshop on Online Abuse and Harms; 2020.
-
- Albadi N, Kurdi M, Mishra S. Are they our brothers? Analysis and detection of religious hate speech in the Arabic Twittersphere. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM); Piscataway. 2018. pp. 69–76.
-
- Anzovino ME, Fersini E, Rosso P. Automatic identification and classification of misogynistic language on Twitter. International conference on applications of natural language to data bases.2018.
-
- Aroyo L, Welty C. Truth is a lie: crowd truth and the seven myths of human annotation. AI Magazine. 2015;36(1):15–24. doi: 10.1609/aimag.v36i1.2564. - DOI
-
- Awan I. Cyber-extremism: ISIS and the power of social media. Society. 2017;54:1–12. doi: 10.1007/s12115-016-0108-3. - DOI
LinkOut - more resources
Full Text Sources