Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 18:13:786229.
doi: 10.3389/fpsyg.2022.786229. eCollection 2022.

How to Develop Reliable Instruments to Measure the Cultural Evolution of Preferences and Feelings in History?

Affiliations

How to Develop Reliable Instruments to Measure the Cultural Evolution of Preferences and Feelings in History?

Mauricio de Jesus Dias Martins et al. Front Psychol. .

Abstract

While we cannot directly measure the psychological preferences of individuals, and the moral, emotional, and cognitive tendencies of people from the past, we can use cultural artifacts as a window to the zeitgeist of societies in particular historical periods. At present, an increasing number of digitized texts spanning several centuries is available for a computerized analysis. In addition, developments form historical economics have enabled increasingly precise estimations of sociodemographic realities from the past. Crossing these datasets offer a powerful tool to test how the environment changes psychology and vice versa. However, designing the appropriate proxies of relevant psychological constructs is not trivial. The gold standard to measure psychological constructs in modern texts - Linguistic Inquiry and Word Count (LIWC) - has been validated by psychometric experimentation with modern participants. However, as a tool to investigate the psychology of the past, the LIWC is limited in two main aspects: (1) it does not cover the entire range of relevant psychological dimensions and (2) the meaning, spelling, and pragmatic use of certain words depend on the historical period from which the fiction work is sampled. These LIWC limitations make the design of custom tools inevitable. However, without psychometric validation, there is uncertainty regarding what exactly is being measured. To overcome these pitfalls, we suggest several internal and external validation procedures, to be conducted prior to diachronic analyses. First, the semantic adequacy of search terms in bags-of-words approaches should be verified by training semantic vector spaces with the historical text corpus using tools like word2vec. Second, we propose factor analyses to evaluate the internal consistency between distinct bag-of-words proxying the same underlying psychological construct. Third, these proxies can be externally validated using prior knowledge on the differences between genres or other literary dimensions. Finally, while LIWC is limited in the analysis of historical documents, it can be used as a sanity check for external validation of custom measures. This procedure allows a robust estimation of psychological constructs and how they change throughout history. Together with historical economics, it also increases our power in testing the relationship between environmental change and the expression of psychological traits from the past.

Keywords: LIWC; NLP; factor analyses; historical economics; text analysis; word2vec.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Pipeline for diachronic analysis of historical texts (detailed explanation of each step in the text). (1) Proxies and controls. Diachronic analyses require the selection of appropriate proxy measures of target psychological dimensions (A1 and A2) and of control conditions (B1 and B2). For instance, the contrast Cooperation vs. Dominance can be proxied as Prosociality vs. Authoritarianism (Attitudes) or Trustworthiness vs. Strength (Traits). Deriving more than one proxy is crucial for subsequent internal validation (see step 6) and generalisability. (2) Bags of seeds. To derive meaningful bags-of-words for each dimension (A1, A2, B1, and B2), it is necessary to find seed words for subsequent exploratory semantic analysis (step 4). A possible approach is to extract central words in existing psychometric questionnaires. For instance, the seed words “Care,” “Support,” and “Assistance” are central in questionnaires measuring individuals' prosociality (Baumsteiger and Siegel, 2019). An alternative or complementary approach is to use dictionary tools such as WordNet (Princeton University, ; WordNet Interface, n.d.) to generate a list of synonyms and hyponyms. (3) Historical semantic map. The crucial step to generate a historical adequate bags-of-words is to build a semantic vector map of the historical corpus. This enables the exploration of the particular meanings associated with each word in the historical context in step 4. (4) Bags-of-words. For each bag-of-seeds (A1, A2, B1, and B2), each seed word is expanded into a set of semantically similar words (within the particular historical context) using word2vec (Mikolov et al., 2013). (expansion) The seed word and semantically related terms can be added into a bag-of-words (e.g., “spleen,” “resentment,” “jealousie” are related to “anger” in the early modern period). (elimination) The meaning of the seed word can be deemed unspecific and not added to the bags of words (e.g., the word “might” – a synonym of strength–is used more often in the context of “may”/ “should” than of “strength” and should be eliminated). (5) Frequency analysis. For each text, compute the total frequency of items in each bag-of-words (A1, A2, B1, and B2). (6) Internal validity. To evaluate the coherence between several proxies of the same psychological dimension (A1 and A2) vs. proxies of the control measure (B1 and B2), we can use factor analyses (or other correlation procedures). If the factor analysis does not generate a good separation of the psychological dimensions A and B, it is difficult to determine whether the bag-of-words A1 and A2 are adequate as proxies of A. (7) Forming ratios AvB. In diachronic analysis, it is not sufficient to track the dynamics of a psychological variable of interest (A) but rather how it varies in relation to a control variable (B), e.g., using a normalized ratio AvB = (A−B)/(A+B). Using more than one ratio (A1 vs. B1 and A2 vs. B2) can improve generalizability of the results. (8) External validity. The final step before diachronic analysis is to check for ecological validity. Does the ratio AvB correlate meaningfully with proxies in NLP tools validated for modern speech (e.g., cooperation and social orientation in LIWC)? Does it correctly distinguish between text genres known to vary in particular dimensions (e.g., tragedies are more violent than comedies)? (9) Diachronic analysis. We can test: (left) the temporal relationship between the ratio AvB and socioeconomic variables using cross-correlation and lag analyses; (right) the influence of historical events in psychology by comparing ratio means (or growth rates) pre and post event.
Figure 2
Figure 2
Factor analysis including six variables. Three variables are potentially related to cooperation (prosociality, sympathy, and trustworthiness) and three variables are potentially related to dominance (authority, anger, and strength). The analysis shows that cooperation-related variables load higher on Factor 2, while variables related to dominance load higher in Factor 1 (Martins and Baumard, 2020).
Figure 3
Figure 3
Example of external validation techniques. (left) correct distinction between different types of text (the mean ratio Trust/Strength is significantly different between comedies and tragedies, see Martins and Baumard, for details) (right) correlation with indirect proxies from Linguistic Inquiry and Word Count (LIWC).

Similar articles

Cited by

References

    1. Acerbi A., Lampos V., Garnett P., Bentley R. A. (2013). The expression of emotions in 20th century books. PLoS ONE 8, e59030. 10.1371/journal.pone.0059030 - DOI - PMC - PubMed
    1. Barron A. T. J., Huang J., Spang R. L., DeDeo S. (2018). Individuals, institutions, and innovation in the debates of the French Revolution. Proc. Natl. Acad. Sci. U. S. A. 115, 4607. 10.1073/pnas.1717729115 - DOI - PMC - PubMed
    1. Baumard N. (2019). Psychological origins of the industrial revolution. Revolution. 42, 1–63. Available online at: https://haushofer.ne.su.se/publications/Haushofer_Baumard_Commentary_web... - PubMed
    1. Baumsteiger R., Siegel J. T. (2019). Measuring prosociality: the development of a prosocial behavioral intentions scale. J. Pers. Assess. 101, 305. 10.1080/00223891.2017.1411918 - DOI - PubMed
    1. Bourke P. (1996). Cross Correlation: AutoCorrelation - 2D Pattern Identification.