Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May;17(3):805-826.
doi: 10.1177/17456916211004899. Epub 2021 Oct 4.

From Text to Thought: How Analyzing Language Can Advance Psychological Science

Affiliations

From Text to Thought: How Analyzing Language Can Advance Psychological Science

Joshua Conrad Jackson et al. Perspect Psychol Sci. 2022 May.

Abstract

Humans have been using language for millennia but have only just begun to scratch the surface of what natural language can reveal about the mind. Here we propose that language offers a unique window into psychology. After briefly summarizing the legacy of language analyses in psychological science, we show how methodological advances have made these analyses more feasible and insightful than ever before. In particular, we describe how two forms of language analysis-natural-language processing and comparative linguistics-are contributing to how we understand topics as diverse as emotion, creativity, and religion and overcoming obstacles related to statistical power and culturally diverse samples. We summarize resources for learning both of these methods and highlight the best way to combine language analysis with more traditional psychological paradigms. Applying language analysis to large-scale and cross-cultural datasets promises to provide major breakthroughs in psychological science.

Keywords: comparative linguistics; creativity; cultural evolution; emotion; historical linguistics; natural-language processing; psycholinguistics; religion.

PubMed Disclaimer

Conflict of interest statement

Declaration of Conflicting Interests: The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.

Figures

Fig. 1.
Fig. 1.
Words from tweets about climate change (left) and COVID-19 (right). These word clouds come from an algorithm called term frequency-inverse document frequency (TF-IDF), which is designed to highlight words that best distinguish between two corpora. This text was preprocessed using lemmatization and stop-word removal before visualization. Code for generating these plots is available in the supplementary materials on OSF (https://osf.io/hvcg3/).
Fig. 2.
Fig. 2.
The global distribution of individualism and collectivism. Filled nodes represent individualist cultures (low collectivism; scores fall below the midpoint of the 1-to-100 scale from https://www.hofstede-insights.com/product/compare-countries/) and open nodes represent collectivist cultures (high collectivism; scores fall above the midpoint of the 1-to-100 scale). This distribution is represented on a language-based phylogeny. Cultures connected by solid lines are part of the same language family (language family data are from Bromham et al., 2018). The circled letters represent the following language families: I = Indo-European, Au = Austronesian, U = Uralic, S = Sino-Tibetan, Af = Afro-Asiatic, O = other.
Fig. 3.
Fig. 3.
A bibliometric analysis of eight forms of language analysis. Each node is a method, and links between nodes represent first authors who have published using both methods. Colors are communities of clustering nodes from the community-detection algorithm infomap. This algorithm separated comparative-linguistics methods (in gray) and NLP methods (in orange), which have little cross-over but high within-cluster interconnectedness (i.e., researchers who use phylogenetic mapping also study borrowing but do not study word embeddings). Data come from Table S1 in the supplementary materials on OSF (https://osf.io/hvcg3/).
Fig. 4.
Fig. 4.
A flowchart of different language-analysis methods and the kinds of questions they are best suited to answer. Orange boxes represent methods from comparative linguistics, and gray boxes represent methods from NLP. Black boxes approximate the questions that may guide researchers toward these methods. Concepts are defined here as the meaning associated with words. This is meant as a general guide for researchers interested in language analysis, and there is some overlap in classifications. For example, word embeddings can show how language conveys moods and attitudes, and colexification can sometimes uncover evolutionary dynamics.
Fig. 5.
Fig. 5.
The colexification structure of emotion concepts for all languages (top left) and for five individual language families in Jackson and colleagues (2019) analysis of emotion. Nodes are emotion concepts, and links between concepts represent the likelihood that these concepts will be colexified in a language. Color indicates semantic community, which refers to clusters of emotions that are similar in meaning. From Jackson, J. C., Watts, J., Henry, T. R., List, J. M., Forkel, R., Mucha, P. J., Greenhill, S., Gray, R. D., & Lindquist, K. A. (2019). Emotion semantics show both cultural variation and universal structure. Science, 366(6472), 1517–1522. https://doi.org/10.1126/science.aaw8160. Reprinted with permission from AAAS.

References

    1. Allport G. W., Vernon P. E. (1930). The field of personality. Psychological Bulletin, 27(10), 677–730.
    1. Althoff T., Danescu-Niculescu-Mizil C., Jurafsky D. (2014). How to ask for a favor: A case study on the success of altruistic requests. arXiv. https://arxiv.org/abs/1405.3282
    1. American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). 10.1176/appi.books.978089042559 - DOI
    1. Atkinson Q. D., Coomber T., Passmore S., Greenhill S. J., Kushnick G. (2016). Cultural and environmental predictors of pre-European deforestation on Pacific Islands. PLOS ONE, 11(5), Article e0156340. 10.1371/journal.pone.0156340 - DOI - PMC - PubMed
    1. Back M. D., Küfner A. C., Egloff B. (2010). The emotional timeline of September 11, 2001. Psychological Science, 21(10), 1417–1419. 10.1177/0956797610382124 - DOI - PubMed

Publication types

LinkOut - more resources