Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Feb 23;12(2):e0172493.
doi: 10.1371/journal.pone.0172493. eCollection 2017.

GreekLex 2: A comprehensive lexical database with part-of-speech, syllabic, phonological, and stress information

Affiliations

GreekLex 2: A comprehensive lexical database with part-of-speech, syllabic, phonological, and stress information

Antonios Kyparissiadis et al. PLoS One. .

Abstract

Databases containing lexical properties on any given orthography are crucial for psycholinguistic research. In the last ten years, a number of lexical databases have been developed for Greek. However, these lack important part-of-speech information. Furthermore, the need for alternative procedures for calculating syllabic measurements and stress information, as well as combination of several metrics to investigate linguistic properties of the Greek language are highlighted. To address these issues, we present a new extensive lexical database of Modern Greek (GreekLex 2) with part-of-speech information for each word and accurate syllabification and orthographic information predictive of stress, as well as several measurements of word similarity and phonetic information. The addition of detailed statistical information about Greek part-of-speech, syllabification, and stress neighbourhood allowed novel analyses of stress distribution within different grammatical categories and syllabic lengths to be carried out. Results showed that the statistical preponderance of stress position on the pre-final syllable that is reported for Greek language is dependent upon grammatical category. Additionally, analyses showed that a proportion higher than 90% of the tokens in the database would be stressed correctly solely by relying on stress neighbourhood information. The database and the scripts for orthographic and phonological syllabification as well as phonetic transcription are available at http://www.psychology.nottingham.ac.uk/greeklex/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Consonant-type classification.
Consonant types were classified according to the Manner of Articulation, Place of Articulation, and Voicing scales.
Fig 2
Fig 2. Distribution of stress position by number of syllables.
Distribution of stress position by types (2A) and tokens (2B). Syllabic lengths higher than 7 (up to 11) were not presented in the graph as they accumulatively represent a proportion less than 1% (types) and 0.1% (tokens) of the whole set.
Fig 3
Fig 3. Distribution of stress position by part-of-speech category.
Counts for disyllables (3A), trisyllables (3B), and all polysyllables (3C). Only adjectives, nouns and verbs were considered for these calculations.
Fig 4
Fig 4. Frequency of N (Neighbourhood) counts.
Frequency counts for the clustered OLD20 Levenshtein Distance (based on [9]) values and Coltheart’s N orthographic similarity (based on [8]) values. OLD20 values are clustered around their closest integer numbers (e.g. a value of 2 represents the counts of all values between 1.5 and 2.49). Coltheart’s N values above 10 are not presented in the graph as they accumulatively represent a proportion smaller than 0.5% of the whole set.
Fig 5
Fig 5. Orthographic similarity as a function of length.
Distributions of mean OLD20 Levenshtein Distance (based on [9]) and Coltheart’s N orthographic similarity (based on [8]) as a function of word length measured in letters.

Similar articles

Cited by

References

    1. Ktori M, van Heuven WJB, Pitchford NJ. GreekLex: a lexical database of Modern Greek. Behav Res Methods. 2008; 40(3): 773–783. - PubMed
    1. Protopapas A, Tzakosta M, Chalamandaris A, Tsiakoulis P. IPLR: An online resource for Greek word-level and sublexical information. Lang Resour Eval. 2012; 46(3): 449–459.
    1. Dimitropoulou M, Duñabeitia JA, Avilés A, Corral J, Carreiras M. Subtitle-based word frequencies as the best estimate of reading behavior: The case of Greek. Front Psychol. 2010; 1(218): 1–12. - PMC - PubMed
    1. Gimenes M, New B. Worldlex: Twitter and blog word frequencies for 66 languages. Behav Res Methods. 2015; 48(3): 963–72. - PubMed
    1. Hatzigeorgiu N, Gavrilidou M, Piperdis S, Carayannis G, Papakostopoulou A, Spiliotopoulou A, et al. Design and implementation of the online ILSP Greek Corpus. Paper presented at LREC 2000—the Second International Conference on Language Resources and Evaluation, Athens, Greece; 2000. Retrieved from http://www.lrec-conf.org/proceedings/lrec2000/pdf/336.pdf

LinkOut - more resources