Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Nov 6:2025.11.05.686799.
doi: 10.1101/2025.11.05.686799.

Revisiting the Evolution of Lactase Persistence: Insights from South Asian Genomes

Affiliations

Revisiting the Evolution of Lactase Persistence: Insights from South Asian Genomes

Elise Kerdoncuff et al. bioRxiv. .

Abstract

Lactase persistence ( LP ), the ability to digest lactose from milk into adulthood, is a classic example of natural selection in humans. Multiple mutations upstream of the LCT gene are associated with LP and have been previously shown to be under selection in Europeans and Africans. South Asia is the world's largest producer of dairy, and milk and dairy products are widely consumed throughout the subcontinent. However, the origin, evolutionary history and selective pressures associated with LP in South Asia remain elusive. We assembled genome-wide data from ~8,000 present-day and ancient genomes from India, Pakistan, and Bangladesh, spanning diverse timescales (~3300 BCE-1650 CE), geographic regions, and ethnolinguistic and subsistence groups. We find that the Eurasian LP -associated variant, -13.910:C>T, is widespread across South Asia, exhibiting clinal variation along north-south and east-west gradients. Ancient DNA analysis reveals that this variant first appeared in South Asia during the historical and medieval periods through Steppe pastoralist-related gene flow. Interestingly, unlike in other worldwide populations, the LP prevalence is almost entirely explained by Steppe ancestry-not selection-in most contemporary South Asians. A notable exception is the only two pastoralist groups, Toda in South India and Gujjar in Pakistan, that have unexpectedly high frequencies of -13.910*T, comparable to estimates in Northern Europeans. By performing local ancestry inference, we find significant enrichment for Steppe pastoralist ancestry around the LCT locus in these two geographically-distant pastoralist groups, indicative of strong selection. Together, these findings highlight the complex role of ancestry and natural selection in shaping the prevalence of lactase persistence on the subcontinent.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Dataset presentation and evolutionary history of South Asia
A. Map of South Asia Data Sampling Each geographic region of India, all of Pakistan, and all of Bangladesh are colored to reflect whole genome sequences that are sampled based on population density from LASI-DAD and medical cohorts from GAsP2 (the legend for each region in the upper left). Whole genome sequences of endogamous groups from GAsP1 are labeled and plotted in different symbols to match the PCA in B and are scaled according to sample size (n = 5–34). Pastoralist groups are shown in solid triangles, while non-pastoralists are shown in different symbols per region as in (B). Ancient DNA samples from the Swat Valley in Pakistan are represented by vertical bars that vary in length according to the age of the samples (~1491 BCE - 1650 CE). B. Principal Components Analysis of Present-Day and Ancient South Asians Principal Components Analysis was done on unrelated present-day South Asian individuals (India, Pakistan, and Bangladesh), to demonstrate how South Asians relate to other global groups, East Asians and West Eurasians. All individuals are plotted in color according to their geographic region (teal for West Eurasians, pink for East Asians, and South Asians following the same color scheme in (A)). Endogamous groups are plotted by the color of their geographic region and the same symbol as the map in (A), with pastoralists represented as a solid triangle. Ancient South Asians are projected in black in different symbols for each time period. C. Ancestry Modeling of Present-Day South Asians For each region in South Asia (Pakistan, regions of India, and Bangladesh), the average proportion Iranian farmer-related (orange), Steppe pastoralist-related (green), SAHG-related (navy), and East Asian-related (pink) ancestries are shown in a bar summing to 1 (100%). Along the y-axis, the groups are arranged geographically from north to south and west to east.
Figure 2.
Figure 2.. Allele frequency of −13.910:C>T across space and time in South Asia
A. Allele Frequencies across South Asia The allele frequency (red scale) was calculated across Pakistan, India, and Bangladesh. The allele frequency is highest in Pakistan and North India, and decreases in South India, East India, and Bangladesh. B. Allele Frequency in Endogamous Groups The allele frequency was calculated for each endogamous group from GAsP1 across India and Pakistan. The allele frequency of each endogamous group increased with geography, that is, groups in South India have low allele frequencies or are zero, and groups in North India and Pakistan have higher allele frequencies. The pastoralist groups Toda (South India) and Gujjar (Pakistan) are notable outliers. Groups are plotted by color and symbol as in Figure 1A and Figure 1B, representing geographic regions. C. Allele Frequency in Ancient South Asians The allele frequency was calculated for ancient South Asians from Pakistan (~1491 BCE - 1650 CE) for individuals that had at least 3 reads at −13.910:C>T. Each ancient group is shown in the same symbol as plotted in Figure 1B. We note the allele is absent in samples from the Bronze and Iron Ages, but increases in frequency in the Historic and Medieval periods.
Figure 3.
Figure 3.. The correlation between ancestry proportions and allele frequency of −13.910:C>T in South Asians
All South Asian individuals were grouped into bins according to their Steppe pastoralist-related, Iranian farmer-related, and SAHG-related ancestry proportions estimated from qpAdm so an even number of individuals were in each bin. The median value of each bin is reported on the x-axis. Each bin’s average allele frequency was calculated and plotted along the y-axis. The regression line shows the relationship between each bin’s median ancestry proportion and corresponding mean allele frequency. The relationship between allele frequency and Steppe pastoralist-related ancestry is shown in light green, Iranian farmer-related ancestry in dark orange, and SAHG-related ancestry in navy blue. Endogamous groups are projected on each regression line according to their mean ancestry proportion and mean allele frequency. Groups are colored according to geographic region as in Figure 1 A and B, and pastoralist groups Toda and Gujjar are shown as solid triangles, with the rest of the non-pastoralist groups shown as dots.
Figure 4.
Figure 4.. Evidence for Selection at −13.910:C>T locus for Pastoralist groups.
A. Local Steppe*T ancestry and genome-wide Steppe pastoralist–related ancestry across regions and endogamous groups. Each dot represents the mean ancestry proportion for a group, scaled by sample size; triangles indicate pastoralist groups. Error bars show one standard deviation. These values were used in the local ancestry deviation test, stars indicate significant deviations from the genome-wide Steppe ancestry. B. IBD sharing rate Z-scores on chromosome 2 (main panel) and in the region surrounding the −13.910:C>T variant on chromosome 2 (inset), shown for GBR (blue), Gujjar (maroon), and Toda (green). The position of the −13.910:C>T variant is marked by a red vertical line (chr2:135,851,076; hg38).

References

    1. Ingram C.J.E., Mulcare C.A., Itan Y., Thomas M.G., and Swallow D.M. (2009). Lactose digestion and the evolutionary genetics of lactase persistence. Hum Genet 124, 579–591. - PubMed
    1. Swallow D.M. (2003). Genetics of lactase persistence and lactose intolerance. Annu Rev Genet 37, 197–219. - PubMed
    1. Enattah N.S., Sahi T., Savilahti E., Terwilliger J.D., Peltonen L., and Järvelä I. (2002). Identification of a variant associated with adult-type hypolactasia. Nat Genet 30, 233–237. - PubMed
    1. Bersaglieri T., Sabeti P.C., Patterson N., Vanderploeg T., Schaffner S.F., Drake J.A., Rhodes M., Reich D.E., and Hirschhorn J.N. (2004). Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74, 1111–1120. - PMC - PubMed
    1. Enattah N.S., Trudeau A., Pimenoff V., Maiuri L., Auricchio S., Greco L., Rossi M., Lentze M., Seo J.K., Rahgozar S., et al. (2007). Evidence of still-ongoing convergence evolution of the lactase persistence T-13910 alleles in humans. Am J Hum Genet 81, 615–625. - PMC - PubMed

Publication types

LinkOut - more resources