Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Mar 10:2025.03.07.641231.
doi: 10.1101/2025.03.07.641231.

Rapidly evolved genomic regions shape individual language abilities in present-day humans

Affiliations

Rapidly evolved genomic regions shape individual language abilities in present-day humans

Lucas G Casten et al. bioRxiv. .

Abstract

1Minor genetic changes have produced profound differences in cognitive abilities between humans and our closest relatives, particularly in language. Despite decades of research, ranging from single-gene studies to broader evolutionary analyses[1, 2, 3, 4, 5], key questions about the genomic foundations of human language have persisted, including which sequences are involved, how they evolved, and whether similar changes occur in other vocal learning species. Here we provide the first evidence directly linking rapidly evolved genomic regions to language abilities in contemporary humans. Through extensive analysis of 65 million years of evolutionary events in over 30,000 individuals, we demonstrate that Human Ancestor Quickly Evolved Regions (HAQERs)[5] - sequences that rapidly accumulated mutations after the human-chimpanzee split - specifically influence language but not general cognition. These regions evolved to shape language development by altering binding of Forkhead domain transcription factors, including FOXP2. Strikingly, language-associated HAQER variants show higher prevalence in Neanderthals than modern humans, have been stable throughout recent human history, and show evidence of convergent evolution across other mammalian vocal learners. An unexpected pattern of balancing selection acting on these apparently beneficial alleles is explained by their pleiotropic effects on prenatal brain development contributing to birth complications, reflecting an evolutionary trade-off between language capability and reproductive fitness. By developing the Evolution Stratified-Polygenic Score analysis, we show that language capabilities likely emerged before the human-Neanderthal split - far earlier than previously thought[3, 6, 7]. Our findings establish the first direct link between ancient genomic divergence and present-day variation in language abilities, while revealing how evolutionary constraints continue to shape human cognitive development.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Overview of this study
Figure 2:
Figure 2:. Factor loadings and genetic associations
a Loadings of cognitive and language assessments onto the seven language factors. g0 = Kindergarten (age 5–6), g2 = 2nd grade (age 7–8), g4 = 4th grade (age 9–10). b Pearson correlations for language factors (upper triangle) and distribution of each factor (diagonal). c Interpretations of the language factors based on their loadings. d Pearson correlations for each factor with genome-wide PGS. ** indicates FDR adjusted p-value < 0.05 and * indicates unadjusted p-value < 0.05.
Figure 3:
Figure 3:. HAQERs are associated with language ability and not nonverbal IQ
a Comparison of evolutionary events on core language ability in EpiSLI and SPARK. Points represent the β provided from the ES-PGS models for each evolutionary annotation, while the ranges represent the 95% confidence interval. Solid points indicate p-value < 0.05. b Scatterplot of core language scores (F1) with HAQER CP-PGS after adjusting for the background PGS in the EpiSLI sample. c Scatterplot of nonverbal IQ scores (F3) with HAQER CP-PGS after adjusting for the background PGS in the EpiSLI sample. d-f Distributions of rare reversions counts from the SPARK whole genome sequencing data within 10Kb of the following regions: HAQERs (d), HARs (e), and random sequence (RAND, f). g Effects of rare reversions across HAQERs, HARs, and RAND on language related phenotypes in SPARK autism cases. Line ranges represent 95% confidence intervals from logistic and linear regression models. A positive β indicates delayed developmental age or higher likelihood of the diagnosis as reversions increase.
Figure 4:
Figure 4:. HAQERs show coupled evolutionary and functional effects on language-relevant transcription factor binding sites
a-c Relationship between selection for transcription factor motif integrity (x-axis) and motif association with language ability (y-axis) in (a) HAQERs, (b) HARs, and (c) random genomic regions. Each point represents one transcription factor motif. Error bars indicate ±1 standard error. Purple line (or gray for non-significant fits) shows York regression fit with 95% confidence interval (shaded); regression coefficient (β), chi-squared statistic (χ2), and p-values are shown. d Detailed view of motif effects in HAQERs colored by transcription factor family. Solid points indicate motifs with p < 0.05 for both positive selection and positive language association. Colored polygons show convex hulls for each transcription factor family. Several key Forkhead family members are labeled. Dashed lines at x = 0 and y = 0 define quadrants. e Enrichment analysis of transcription factor families for concordant positive selection and language effects, shown as log2 odds ratios. Error bars indicate 95% confidence intervals. Solid points indicate p < 0.05. f HAQER sequence similarity scores in vocal learning (blue) versus non-vocal learning (orange) mammals. Violin plots show score distributions, with individual species are indicated by points. Phylogenetic logistic regression coefficient (β) and p-value are shown.
Figure 5:
Figure 5:. Selective pressures acting on language alleles in HAQERs
a Polygenic selection of HAQER and background CP-PGS, correlating sample age with CP-PGS in ancient west Eurasians from the AADR. b-c Distribution of HAQER CP-PGS (b) or background CP-PGS (c) in modern Europeans (N = 503 individuals from the 1000 Genomes dataset), with black dotted lines indicating the PGS in the four Neanderthal and Denisovan genomes. d Comparison of F-statistics across HAQERs, HARs, and random sequences (RAND). F-statistics measure heterozygosity enrichment, with lower values indicating more heterozygosity than expected. “***” is used to indicate statistical significance (p-value < 0.001) based on t-test comparisons between each pair of regions. e Site Frequency Spectrum (SFS) comparison between HAQERs, HARs, and random sequences (RAND). x-axis represent minor allele frequency bins, y-axis is the log2(ratio) comparing HAQERs to the other sequence types. Positive log2(ratios) indicate that HAQERs have proportionally more variants in that allele frequency bin compared to the other sequence type. f Correlation between core language ability (F1, x-axis) with F-statistics in HAQERs (y-axis).

References

    1. Lai C. S. L., Fisher S. E., Hurst J. A., Vargha-Khadem F., and Monaco A. P., “A forkhead-domain gene is mutated in a severe speech and language disorder,” Nature, vol. 413, pp. 519–523, 10 2001. - PubMed
    1. Vernes S. C., Newbury D. F., Abrahams B. S., Winchester L., Nicod J., Groszer M., Alarcón M., Oliver P. L., Davies K. E., Geschwind D. H., Monaco A. P., and Fisher S. E., “A functional genetic link between distinct developmental language disorders,” New England Journal of Medicine, vol. 359, pp. 2337–2345, 11 2008. - PMC - PubMed
    1. Tajima Y., Vargas C. D. M., Ito K., Wang W., Luo J.-D., Xing J., Kuru N., Machado L. C., Siepel A., Carroll T. S., Jarvis E. D., and Darnell R. B., “A humanized nova1 splicing factor alters mouse vocal communications,” Nat. Commun., vol. 16, Feb. 2025. - PMC - PubMed
    1. Pollard K. S., Salama S. R., King B., Kern A. D., Dreszer T., Katzman S., Siepel A., Pedersen J. S., Bejerano G., Baertsch R., Rosenbloom K. R., Kent J., and Haussler D., “Forces shaping the fastest evolving regions in the human genome.,” PLoS genetics, vol. 2, p. e168, 10 2006. - PMC - PubMed
    1. Mangan R. J., Alsina F. C., Mosti F., Sotelo-Fonseca J. E., Snellings D. A., Au E. H., Carvalho J., Sathyan L., Johnson G. D., Reddy T. E., Silver D. L., and Lowe C. B., “Adaptive sequence divergence forged new neurodevelopmental enhancers in humans.,” Cell, vol. 185, pp. 4587–4603.e23, 11 2022. - PMC - PubMed

Publication types

LinkOut - more resources