Health-Related Content in Transformer-Based Deep Neural Network Language Models: Exploring Cross-Linguistic Syntactic Bias
- PMID: 35773848
- DOI: 10.3233/SHTI220702
Health-Related Content in Transformer-Based Deep Neural Network Language Models: Exploring Cross-Linguistic Syntactic Bias
Abstract
This paper explores a methodology for bias quantification in transformer-based deep neural network language models for Chinese, English, and French. When queried with health-related mythbusters on COVID-19, we observe a bias that is not of a semantic/encyclopaedical knowledge nature, but rather a syntactic one, as predicted by theoretical insights of structural complexity. Our results highlight the need for the creation of health-communication corpora as training sets for deep learning.
Keywords: COVID-19; Corpora; Knowledge Reproduction; Language Models; Natural Language Processing.
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
