The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words

Simona Amenta¹, Andrea Gregor de Varda², Pawel Mandera³, Emmanuel Keuleers⁴, Marc Brysbaert⁵, Marco Marelli²

Affiliations

¹ Department of Psychology, University of Milano-Bicocca, P.zza dell'Ateneo Nuovo, 1, 20126, Milano, Italy. simona.amenta@unimib.it.
² Department of Psychology, University of Milano-Bicocca, P.zza dell'Ateneo Nuovo, 1, 20126, Milano, Italy.
³ Lingvist Technologies, Tallinn, Estonia.
⁴ Department of Cognitive Science and Artificial Intelligence, University of Tilburg, Tilburg, The Netherlands.
⁵ Department of Experimental Psychology, Ghent University, Ghent, Belgium.

PMID: 39733067
DOI: 10.3758/s13428-024-02548-4

The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words

Simona Amenta et al. Behav Res Methods. 2024.

. 2024 Dec 28;57(1):26.

doi: 10.3758/s13428-024-02548-4.

Authors

Simona Amenta¹, Andrea Gregor de Varda², Pawel Mandera³, Emmanuel Keuleers⁴, Marc Brysbaert⁵, Marco Marelli²

Affiliations

¹ Department of Psychology, University of Milano-Bicocca, P.zza dell'Ateneo Nuovo, 1, 20126, Milano, Italy. simona.amenta@unimib.it.
² Department of Psychology, University of Milano-Bicocca, P.zza dell'Ateneo Nuovo, 1, 20126, Milano, Italy.
³ Lingvist Technologies, Tallinn, Estonia.
⁴ Department of Cognitive Science and Artificial Intelligence, University of Tilburg, Tilburg, The Netherlands.
⁵ Department of Experimental Psychology, Ghent University, Ghent, Belgium.

PMID: 39733067
DOI: 10.3758/s13428-024-02548-4

Abstract

Despite being largely spoken and studied by language and cognitive scientists, Italian lacks large resources of language processing data. The Italian Crowdsourcing Project (ICP) is a dataset of word recognition times and accuracy including responses to 130,465 words, which makes it the largest dataset of its kind item-wise. The data were collected in an online word knowledge task in which over 156,000 native speakers of Italian took part. We validated the ICP dataset by (1) showing that ICP reaction times correlate strongly (r = .78) with lexical decision latencies collected in a traditional lab experiment, (2) showing that the effect of major psycholinguistic variables (e.g., frequency, length, etc.) can be replicated in this dataset, and (3) replicating the effect of word prevalence, which we compute here for the first time for Italian. Given the inclusion of many inflectional forms of verbs, adjectives, and nouns, we further showcase the potential of this dataset by exploring two phenomena (inflectional entropy in verb paradigms and the clitic effect in isolated word recognition) that build on the peculiar properties of Italian. In this paper we present the ICP resource and release response times, accuracy, and prevalence estimates for all the words included.

Keywords: Crowdsourcing; Lexical decision; Megastudy; Prevalence; Word recognition.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval: Approval was obtained from the ethics committee of the University of Milano-Bicocca (see corresponding sections in the manuscript). The procedures used in this study adhere to the tenets of the Declaration of Helsinki. Consent to participate: Informed consent was obtained from all individual participants included in the study. Consent for publication: No data allowing the identification of participants were collected, included in this paper or in any of the available materials. Participants agreed to the publication of their behavioral data in anonymous and/or aggregated form. Competing interests: Marco Marelli and Simona Amenta serve as Associate Editor and Consulting Editor, respectively, on the Editorial Board of this Journal. The authors have no competing interests to declare that are relevant to the content of this article.

References

1. Aguasvivas, J. A., Carreiras, M., Brysbaert, M., Mandera, P., Keuleers, E., & Duñabeitia, J. A. (2018). SPALEX: A Spanish lexical decision database from a massive online data collection. Frontiers in Psychology, 9, 2156. - DOI - PubMed - PMC
1. Aguasvivas, J., Carreiras, M., Brysbaert, M., Mandera, P., Keuleers, E., & Duñabeitia, J. A. (2020). How do Spanish speakers read words? Insights from a crowdsourced lexical decision megastudy. Behavior Research Methods, 52, 1867–1882. - DOI - PubMed
1. Amenta, S., Foppolo, F., & Badan, L. (in press). The role of morphological information in processing pseudo-words in Italian L2 learners: It’s a matter of experience. Journal of Cognition.
1. Amenta, S., Marelli, M., & Sulpizio, S. (2017). From sound to meaning: Phonology-to-semantics mapping in visual word recognition. Psychonomic Bulletin & Review, 24, 887–893. - DOI
1. Arosio, F., Branchini, C., Barbieri, L., & Guasti, M. T. (2014). Failure to produce direct object clitic pronouns as a clinical marker of SLI in school-aged Italian speaking children. Clinical Linguistics & Phonetics, 28(9), 639–663. - DOI

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

3G011617W/Fonds Wetenschappelijk Onderzoek

LinkOut - more resources

Full Text Sources
- Springer
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words

Affiliations

The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words

Authors

Affiliations

Abstract

Conflict of interest statement

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous