. 2015 Aug;157(4):630-40.

doi: 10.1002/ajpa.22758. Epub 2015 Jun 8.

Across language families: Genome diversity mirrors linguistic variation within Europe

Giuseppe Longobardi^{1

2}, Silvia Ghirotto³, Cristina Guardiano⁴, Francesca Tassi³, Andrea Benazzo³, Andrea Ceolin¹, Guido Barbujani³

Affiliations

¹ Department of Language and Linguistic Science, University of York, York, UK.
² Department of Humanities, University of Trieste, Trieste, Italy.
³ Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy.
⁴ Department of Communication and Economics, University of Modena-Reggio Emilia, Modena, Italy.

PMID: 26059462
PMCID: PMC5095809
DOI: 10.1002/ajpa.22758

Across language families: Genome diversity mirrors linguistic variation within Europe

Giuseppe Longobardi et al. Am J Phys Anthropol. 2015 Aug.

. 2015 Aug;157(4):630-40.

doi: 10.1002/ajpa.22758. Epub 2015 Jun 8.

Authors

Giuseppe Longobardi^{1

2}, Silvia Ghirotto³, Cristina Guardiano⁴, Francesca Tassi³, Andrea Benazzo³, Andrea Ceolin¹, Guido Barbujani³

Affiliations

¹ Department of Language and Linguistic Science, University of York, York, UK.
² Department of Humanities, University of Trieste, Trieste, Italy.
³ Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy.
⁴ Department of Communication and Economics, University of Modena-Reggio Emilia, Modena, Italy.

PMID: 26059462
PMCID: PMC5095809
DOI: 10.1002/ajpa.22758

Abstract

Objectives: The notion that patterns of linguistic and biological variation may cast light on each other and on population histories dates back to Darwin's times; yet, turning this intuition into a proper research program has met with serious methodological difficulties, especially affecting language comparisons. This article takes advantage of two new tools of comparative linguistics: a refined list of Indo-European cognate words, and a novel method of language comparison estimating linguistic diversity from a universal inventory of grammatical polymorphisms, and hence enabling comparison even across different families. We corroborated the method and used it to compare patterns of linguistic and genomic variation in Europe.

Materials and methods: Two sets of linguistic distances, lexical and syntactic, were inferred from these data and compared with measures of geographic and genomic distance through a series of matrix correlation tests. Linguistic and genomic trees were also estimated and compared. A method (Treemix) was used to infer migration episodes after the main population splits.

Results: We observed significant correlations between genomic and linguistic diversity, the latter inferred from data on both Indo-European and non-Indo-European languages. Contrary to previous observations, on the European scale, language proved a better predictor of genomic differences than geography. Inferred episodes of genetic admixture following the main population splits found convincing correlates also in the linguistic realm.

Discussion: These results pave the ground for previously unfeasible cross-disciplinary analyses at the worldwide scale, encompassing populations of distant language families.

Keywords: genome-wide diversity; human evolutionary history; parametric comparison method; single-nucleotide polymorphisms.

PubMed Disclaimer

Figures

**Figure 1**
Geographic distribution of the samples considered in this study. Indo‐European‐speaking populations in blue, populations speaking Finno‐Ugric languages (Hungarian, Finnish) and the linguistic isolate (Basque) in red.

**Figure 2**
UPGMA trees summarizing population relationships. Distances inferred from: (A) lexical and (B) syntactic comparisons among 12 Indo‐European‐speaking European populations; (C) syntactic comparisons among 15 European languages, and (D) F _ST distances among 15 populations sharing 177,949 SNPs. Lexical distances were estimated from lists of cognate words, amounting to over 6,000 roots (http://ielex.mpi.nl/); syntactic distances were measured over 56 parameters of nominal phrases (http://dx.doi.org/10.1075/jhl.3.1.07lon.additional). In (D), numbers indicate the support of the branching after 100 bootstrap replicates. The matrix perturbation techniques usable to test the robustness of trees (bootstrapping and jackknifing) provide stable topologies, but owing to the small number of characters involved they are only relatively reliable (cf. Longobardi et al., 2013 for more details). Therefore, bootstrapping scores have been only reported here for the genetic tree D.

**Figure 3**
Projection on two dimensions of the main components (PCA) of linguistic (A) and individual genomic (B) variation. The linguistic PCA was performed using the *R FactoMineR* program, with neutralized parameter values coded as “NA,” whereas the genomic PCA was calculated with the *R SNPRelate* package (Lê et al., 2008). Note that the linguistic scatter diagram accounts for a fraction of the total variance that is >25‐fold as large as that accounted for by the genomic scatter diagram.

**Figure 4**
Unsupervised ancestry‐inference analysis based on the software ADMIXTURE. Each individual genotype is represented by a column in the area representing the appropriate population, and colors correspond to the fraction of the genotype that can be attributed to each of the K groups (2 ≤ K ≤ 5) assumed to have contributed to the populations' ancestry.

**Figure 5**
Maximum‐likelihood population trees. The algorithm chosen, TreeMix (28), estimates phylogenetic relationships with (A) three, (B) one, and (C) two superimposed migration events after the main population splits.

See this image and copyright information in PMC

Cited by

How humans transmit language: horizontal transmission matches word frequencies among peers on Twitter.
Bryden J, Wright SP, Jansen VAA. Bryden J, et al. J R Soc Interface. 2018 Feb;15(139):20170738. doi: 10.1098/rsif.2017.0738. J R Soc Interface. 2018. PMID: 29436508 Free PMC article.
Formal Syntax and Deep History.
Ceolin A, Guardiano C, Irimia MA, Longobardi G. Ceolin A, et al. Front Psychol. 2020 Dec 18;11:488871. doi: 10.3389/fpsyg.2020.488871. eCollection 2020. Front Psychol. 2020. PMID: 33391062 Free PMC article.
Genetic Reconstruction and Forensic Analysis of Chinese Shandong and Yunnan Han Populations by Co-Analyzing Y Chromosomal STRs and SNPs.
Yin C, Su K, He Z, Zhai D, Guo K, Chen X, Jin L, Li S. Yin C, et al. Genes (Basel). 2020 Jul 3;11(7):743. doi: 10.3390/genes11070743. Genes (Basel). 2020. PMID: 32635262 Free PMC article.
A multicenter case-control study of the effect of e-nos VNTR polymorphism on upper gastrointestinal hemorrhage in NSAID users.
Mallah N, Zapata-Cachafeiro M, Aguirre C, Ibarra-García E, Palacios-Zabalza I, Macías García F, Iglesias García J, Piñeiro-Lamas M, Ibáñez L, Vidal X, Vendrell L, Martin-Arias L, Gil MS, Velasco-González V, Salgado-Barreira Á, Figueiras A. Mallah N, et al. Sci Rep. 2021 Oct 7;11(1):19923. doi: 10.1038/s41598-021-99402-w. Sci Rep. 2021. PMID: 34620931 Free PMC article.
Synergism interaction between genetic polymorphisms in drug metabolizing enzymes and NSAIDs on upper gastrointestinal haemorrhage: a multicenter case-control study.
Mallah N, Zapata-Cachafeiro M, Aguirre C, Ibarra-García E, Palacios-Zabalza I, Macías-García F, Piñeiro-Lamas M, Ibáñez L, Vidal X, Vendrell L, Martin-Arias L, Sáinz-Gil M, Velasco-González V, Bacariza-Cortiñas M, Salgado A, Estany-Gestal A, Figueiras A; EMPHOGEN Group. Mallah N, et al. Ann Med. 2022 Dec;54(1):379-392. doi: 10.1080/07853890.2021.2016940. Ann Med. 2022. PMID: 35114859 Free PMC article.

See all "Cited by" articles

References

1. 1000 Genomes Project Consortium. 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65. - PMC - PubMed
1. Alexander DH, Novembre J, Lange K. 2009. Fast model‐based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664. - PMC - PubMed
1. Alonso S, Flores C, Cabrera V, Alonso A, Martín P, Albarrán C, Izagirre N, de la Rúa C, García O. 2005. The place of the basques in the European Y‐chromosome diversity landscape. Eur J Hum Genet 13:1293‐1302. - PubMed
1. Baker M. 2001. The atoms of language. New York: Basic Books.
1. Barbujani G, Colonna V. 2010. Human genome diversity: frequently asked questions. Trends Genet 26:285–295. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Across language families: Genome diversity mirrors linguistic variation within Europe

Affiliations

Across language families: Genome diversity mirrors linguistic variation within Europe

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous