Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 22:5:718690.
doi: 10.3389/frai.2022.718690. eCollection 2022.

Computational Models of Readers' Apperceptive Mass

Affiliations

Computational Models of Readers' Apperceptive Mass

Arthur M Jacobs et al. Front Artif Intell. .

Abstract

Recent progress in machine-learning-based distributed semantic models (DSMs) offers new ways to simulate the apperceptive mass (AM; Kintsch, 1980) of reader groups or individual readers and to predict their performance in reading-related tasks. The AM integrates the mental lexicon with world knowledge, as for example, acquired via reading books. Following pioneering work by Denhière and Lemaire (2004), here, we computed DSMs based on a representative corpus of German children and youth literature (Jacobs et al., 2020) as null models of the part of the AM that represents distributional semantic input, for readers of different reading ages (grades 1-2, 3-4, and 5-6). After a series of DSM quality tests, we evaluated the performance of these models quantitatively in various tasks to simulate the different reader groups' hypothetical semantic and syntactic skills. In a final study, we compared the models' performance with that of human adult and children readers in two rating tasks. Overall, the results show that with increasing reading age performance in practically all tasks becomes better. The approach taken in these studies reveals the limits of DSMs for simulating human AM and their potential for applications in scientific studies of literature, research in education, or developmental science.

Keywords: SentiArt; apperceptive mass; childLex; digital humanities; distributed semantic models; literary reading; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
(A) Twenty most frequent names and places in the Bible (German Luther Bibel). (B) 20 most frequent actions of Jesus in the Bible. (C) Appearance density of 25 major characters in the Bible ordered according to verse numbera. (D) Interaction network of nine major characters in the Bible (interaction frequency is represented by line width: bold >dashed>dotted). (E) Emotional figure profiles for Jesus and Judas computed with SentiArt (Jacobs, 2019). acf. https://pmbaumgartner.github.io/blogfholy-nlp/.
Figure 2
Figure 2
(A) Violin plot of stepwise distances for the three CL corpora. (B) Principal Component Analysis (two first components only) of stepwise distances for three representative books from each CL subcorpus. Distances between points represent semantic variance, the focus being on distances between consecutive text chunks.
Figure 3
Figure 3
tsne representations of the semantic space of the sdewac model for exemplary concrete (A), abstract (B), and emotion concepts (C). The words that are most similar to the target words (e. g., to woman/Frau or to man/Mann) are plotted inthe same color.
Figure 4
Figure 4
(A–I) tsne representations of the semantic space of the three CL models for exemplary concrete, abstract, and emotion concepts.
Figure 5
Figure 5
Performance of the four models (% accuracy) in predicting human rating data for a word similarity (blue) and valence decision task (orange).

References

    1. Alghanmi I., Espinosa Anke L., Schockaert S. (2020). “Combining BERT with static word embeddings for categorizing social media,” in Proceedings of the Sixth Workshop on Noisy Usergenerated Text (W-NUT 2020) (Association for Computational Linguistics: ), 28–33.
    1. Andrews M., Vigliocco G., Vinson D. (2009). Integrating experiential and distributional data to learn semantic representations. Psychol. Rev. 116, 463–498. 10.1037/a0016261 - DOI - PubMed
    1. Baroni M., Bernardini S., Ferraresi A., Zanchetta E. (2009). The WaCky wide web: a collection of very large linguistically processed web-crawled corpora. Lang. Resour. Eval. 43, 209–226. 10.1007/s10579-009-9081-4 - DOI
    1. Baroni M., Dinu G., Kruszewski G. (2014). “Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors,” in Proceedings of ACL (Baltimore, MD: ).
    1. Baroni M., Lenci A. (2010). Distributional memory: A general framework for corpus-based semantics. Comput. Linguist. 36, 673–721. 10.1162/coli_a_00016 - DOI

LinkOut - more resources