Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Sep 2;12(1):1541.
doi: 10.1038/s41597-025-05445-3.

The Indo-European Cognate Relationships dataset

Cormac Anderson  1   2 Matthew Scarborough  3 Lechosław Jocz  4 Martin Joachim Kümmel  5 Thomas Jügel  6 Britta Irslinger  7 Roland Pooth  8 Henrik Liljegren  9 Richard F Strand  10 Geoffrey Haig  11 Ulrich Geupel  12   13 Martin Macak  14 Ronald I Kim  15 Erik Anonby  11   16 Tijmen Pronk  17 Oleg Belyaev  18 Tonya Kim Dewey-Findell  19 Matthew Boutilier  20 Cassandra Freiberg  21 Robert Tegethoff  5   22 Matilde Serangeli  5   23 Krzysztof Stroński  24 Alexander Falileyev  25 Nikos Liosis  26 Kim Schulte  27 Ganesh Kumar Gupta  24 Raheleh Izadifar  28 Patrycja Markus  24 Nicholas Williams  29 Simone Loi  30 Nicholas Sims-Williams  31 Martin Findell  32 Shirin Adibifar  11 Giovanni Abete  33 Petar Atanasov  34 Esther Baiwir  35 Maria-Reina Bastardas  36 Adam Benkato  37 Lisa Shugert Bevevino  38 Éva Buchi  39 Giorgio Cadorini  40 Chundra Cathcart  41 Loïc Cheveau  42 Charalambos Christodoulou  43 Jérémie Delorme  44 Steven N Dworkin  45 Deniz Ekici  46 Shervin Farridnejad  47 Mojtaba Gheitasi  48 Harald Hammarström  49 Steve Hewitt  50 Afsar Ali Khan  51 Muhammad Kamal Khan  52 Liudmila Khokhlova  53 Deborah Kim  54 Christopher Lewin  55 Borana Lushaj  56 Parvin Mahmoudveysi  57 Masoud Mahommadirad  58   59 Sam Mersch  60 Baydaa Mustafa  11 Fatemeh Nemati  61 Maryam Nourzaei  11   49 Peadar Ó Muircheartaigh  62 Virginia Oogjen  17 Muhammed Ourang  63 Heather Pagan  64 Timothy S Palmer  17 Steve Pepper  65 Mandar Purandare  24 Khwaja Rehman  66 Guto Rhys  67 Unn Røyneland  68 Muhammad Zaman Sagar  69 Jade Jørgen Sandstedt  70   71 Lars Steensland  72 Mortaza Taheri-Ardali  16   73 Mahnaz Talebi-Dastenaei  16   74 Sabine Tittel  75 Tiago Tresoldi  49 Michiel de Vaan  76 Annemarie Verkerk  77 Arjen Versloot  78 Paul Videsott  79 Nikola Vuletić  80 Manuel Widmer  81 Arash Zeini  82 Hans-Jörg Bibiko  83 Fiona Runge  84 Russell D Gray  85   86 Paul Heggarty  87   88   89   90
Affiliations

The Indo-European Cognate Relationships dataset

Cormac Anderson et al. Sci Data. .

Abstract

The Indo-European Cognate Relationships (IE-CoR) dataset is an open-access relational dataset showing how related, inherited words ('cognates') pattern across 160 languages of the Indo-European family. IE-CoR is intended as a benchmark dataset for computational research into the evolution of the Indo-European languages. It is structured around 170 reference meanings in core lexicon, and contains 25731 lexeme entries, analysed into 4981 cognate sets. Novel, dedicated structures are used to code all known cases of horizontal transfer. All 13 main documented clades of Indo-European, and their main subclades, are well represented. Time calibration data for each language are also included, as are relevant geographical and social metadata. Data collection was performed by an expert consortium of 89 linguists drawing on 355 cited sources. The dataset is extendable to further languages and meanings and follows the Cross-Linguistic Data Format (CLDF) protocols for linguistic data. It is designed to be interoperable with other cross-linguistic datasets and catalogues, and provides a reference framework for similar initiatives for other language families.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Language sample in IE-CoR 1.2. Colours represent main clades.
Fig. 2
Fig. 2
Schematic, simplified overview of the relationships between fields in the main tables of the IE-CoR dataset.
Fig. 3
Fig. 3
Illustration of cognate sets and lexemes across the Indo-European language family in the IE-CoR dataset, for the example meaning FIRE. An interactive version is available at iecor.clld.org/parameters/fire.

References

    1. Eberhard, D. M., Simons, G. F. & Fennig, C. D. Ethnologue: Languages of the World, 26th ed. https://www.ethnologue.com (SIL International, 2023).
    1. Heggarty, P. Cognacy databases and phylogenetic research on Indo-European. Annual Review of Linguistics7, 371–94, 10.1146/annurev-linguistics-011619-030507 (2021).
    1. Ringe, D. A., Warnow, T. & Taylor, A. Indo-European and computational cladistics. Transactions of the Philological Society100(1), 59–129, 10.1111/1467-968X.00091 (2002).
    1. Nakhleh, L., Ringe, D. & Warnow, T. Perfect phylogenetic networks: A new methodology for reconstructing the evolutionary history of natural languages. Language81, 382–420, 10.1353/lan.2005.0078 (2005).
    1. Bouckaert, R. et al. Mapping the origins and expansion of the Indo-European language family. Science337, 957–960, 10.1126/science.1219669 (2012). - PMC - PubMed

LinkOut - more resources