Computational Models of Readers' Apperceptive Mass
- PMID: 35280232
- PMCID: PMC8905622
- DOI: 10.3389/frai.2022.718690
Computational Models of Readers' Apperceptive Mass
Abstract
Recent progress in machine-learning-based distributed semantic models (DSMs) offers new ways to simulate the apperceptive mass (AM; Kintsch, 1980) of reader groups or individual readers and to predict their performance in reading-related tasks. The AM integrates the mental lexicon with world knowledge, as for example, acquired via reading books. Following pioneering work by Denhière and Lemaire (2004), here, we computed DSMs based on a representative corpus of German children and youth literature (Jacobs et al., 2020) as null models of the part of the AM that represents distributional semantic input, for readers of different reading ages (grades 1-2, 3-4, and 5-6). After a series of DSM quality tests, we evaluated the performance of these models quantitatively in various tasks to simulate the different reader groups' hypothetical semantic and syntactic skills. In a final study, we compared the models' performance with that of human adult and children readers in two rating tasks. Overall, the results show that with increasing reading age performance in practically all tasks becomes better. The approach taken in these studies reveals the limits of DSMs for simulating human AM and their potential for applications in scientific studies of literature, research in education, or developmental science.
Keywords: SentiArt; apperceptive mass; childLex; digital humanities; distributed semantic models; literary reading; machine learning.
Copyright © 2022 Jacobs and Kinder.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures
References
-
- Alghanmi I., Espinosa Anke L., Schockaert S. (2020). “Combining BERT with static word embeddings for categorizing social media,” in Proceedings of the Sixth Workshop on Noisy Usergenerated Text (W-NUT 2020) (Association for Computational Linguistics: ), 28–33.
-
- Baroni M., Bernardini S., Ferraresi A., Zanchetta E. (2009). The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43, 209–226. 10.1007/s10579-009-9081-4 - DOI
-
- Baroni M., Dinu G., Kruszewski G. (2014). “Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors,” in Proceedings of ACL (Baltimore, MD: ).
-
- Baroni M., Lenci A. (2010). Distributional memory: A general framework for corpus-based semantics. Comput. Linguist. 36, 673–721. 10.1162/coli_a_00016 - DOI
LinkOut - more resources
Full Text Sources
