Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 Feb;32(1):203-225.
doi: 10.3758/s13423-024-02551-y. Epub 2024 Aug 22.

The principal components of meaning, revisited

Affiliations
Review

The principal components of meaning, revisited

Chris Westbury et al. Psychon Bull Rev. 2025 Feb.

Abstract

Osgood, Suci, and Tannebaum were the first to attempt to identify the principal components of semantics using dimensional reduction of a high-dimensional model of semantics constructed from human judgments of word relatedness. Modern word-embedding models analyze patterns of words to construct higher dimensional models of semantics that can be similarly subjected to dimensional reduction. Hollis and Westbury characterized the first eight principal components (PCs) of a word-embedding model by correlating them with several well-known lexical measures, such as logged word frequency, age of acquisition, valence, arousal, dominance, and concreteness. The results show some clear differentiation of interpretation between the PCs. Here, we extend this work by analyzing a larger word-embedding matrix using semantic measures initially derived from subjective inspection of the PCs. We then use quantitative analysis to confirm the utility of these subjective measures for predicting PC values and cross-validate them on two word-embedding matrices developed on distinct corpora. Several semantic and word class measures are strongly predictive of early PC values, including first-person and second-person verbs, personal relevance of abstract and concrete words, affect terms, and names of places and people. The predictors of the lowest magnitude PCs generalized well to word-embedding matrices constructed from separate corpora, including matrices constructed using different word-embedding methods. The predictive categories we describe are consistent with Wittgenstein's argument that an autonomous level of social interaction grounds linguistic meaning.

Keywords: Lexical co-occurrence; Principal components analysis; Semantics; Word meaning; Word-embedding models; Word2vec.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval/Consent to participate: Since this work relies only on computational modeling, no ethics approval was sought or required. Consent for publication: All authors have approved the publication of this manuscript in its present form. Data availability: All data and R files required for replicating the analyses and plots in this paper are available ( https://osf.io/3zrek/ ). Conflicts of interest: The authors declare that we have no conflicts of interest or competing interests. Open practices statement: The data and materials for all experiments are available online ( https://osf.io/3zrek ). None of the experiments was preregistered.

References

    1. Baayen, R. H. (2010). Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon, 5(3), 436–461. - DOI
    1. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445–459. - DOI - PubMed
    1. Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146. - DOI
    1. Davies, M. (2010). The Corpus of Contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing, 25(4), 447–464. - DOI
    1. Evangelopoulos, N., Zhang, X., & Prybutok, V. R. (2012). Latent semantic analysis: Five methodological recommendations. European Journal of Information Systems, 21, 70–86. - DOI

LinkOut - more resources