Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 6:14:giaf099.
doi: 10.1093/gigascience/giaf099.

A comprehensive water buffalo pangenome reveals extensive structural variation linked to population-specific signatures of selection

Affiliations

A comprehensive water buffalo pangenome reveals extensive structural variation linked to population-specific signatures of selection

Fazeela Arshad et al. Gigascience. .

Abstract

Background: Water buffalo is a cornerstone livestock species in many low- and middle-income countries, yet major gaps persist in its genomic characterization-complicated by the divergent karyotypes of its two subspecies (swamp and river). Such genomic complexity makes water buffalo a particularly good candidate for the use of graph genomics, which can capture variation missed by linear reference approaches. However, the utility of this approach to improve water buffalo has been largely unexplored.

Results: We present a comprehensive pangenome that integrates 4 newly generated, highly contiguous assemblies of Pakistani river buffalo with 8 publicly available assemblies from both subspecies. This doubles the number of accessible high-quality river buffalo genomes and provides the most contiguous assemblies for the subspecies to date. Using the pangenome to assay variation across 711 global samples, we uncovered extensive genomic diversity, including thousands of large structural variants absent from the reference genome, spanning over 140 Mb of additional sequence. We demonstrate the utility of these data by identifying putative functional indels and structural variants linked to selective sweeps in key genes involved in productivity and immune response across 26 populations.

Conclusions: This study represents one of the first successful applications of graph genomics in water buffalo and offers valuable insights into how integrating assemblies can transform analyses of water buffalo and other species with complex evolutionary histories. We anticipate that these assemblies, as well as the pangenome and putative functional structural variants we have released, will accelerate efforts to unlock water buffalo's genetic potential, improving productivity and resilience in this economically important species.

Keywords: Azikheli; Nili-Ravi; Pakistani river buffalo; genome assembly; pangenome; structural variation; water buffalo.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1:
Figure 1:
(A) Scatterplot of publicly available and newly generated genome assemblies, illustrating contig N50 on the x-axis and estimated genome size on the y-axis. Red bubbles represent river buffalo genomes, while blue bubbles denote swamp buffalo genome assemblies. The newly generated assemblies are highlighted as the top 4 red-labeled bubbles, surpassing the contiguity of the Mediterranean river genome (also labeled). (B) Phylogenetic tree showing the evolutionary relationships among the 12 water buffalo genome assemblies used in this study, including the newly generated haplotype resolved assemblies at the top and the cattle reference assembly (at the bottom) added as an outgroup. The tree was constructed based on the mash distances (distance matrix) method with 100 bootstrap replicates. Bootstrap values are displayed at the nodes. (C) Proportion of variants in each class in the pangenome graph. The graph includes insertions and deletions (indels) <50 bp and structural variants (SVs) ≥50 bp. (D) Upset plot of sets of SVs found across different assemblies. Each column represents a set of SVs with the points indicating in which assemblies the SVs were found. The bar graph along the top displays the number of SVs in the corresponding set. Only the 40 sets with the most SVs are shown.
Figure 2:
Figure 2:
(A) Geographic distribution of global water buffalo populations used in the study. The size of each pie corresponds to the relative sample size, while red and blue colors represent river and swamp buffalo subspecies, respectively. (B) Admixture plot for different K values ranging from K = 2 to K = 6.
Figure 3:
Figure 3:
(A) Agreement in genotype calls from GATK and PanGenie across 81 river buffalo samples. In each plot, each column corresponds to a sample, and the y-axis indicates the proportion of variants called by both variant callers (red) or only by PanGenie (blue) or GATK (green). Intensity of color indicates the variants’ sequence context (found in repetitive or nonrepetitive sequence contexts). Panels are further broken down by variant type (SNV or non-SNV) and the approximate coverage of the samples (10× or 30× sequencing coverage). (B) The allele frequencies among the samples of variants specifically called only by either GATK or PanGenie in the high-coverage (30×) samples. The results are broken down according to whether variants are found in repetitive regions.
Figure 4:
Figure 4:
Enrichment analysis of the genes under peaks identified in the iHS analysis. The x-axis of the first panel shows human GWAS traits enriched among the genes falling under iHS peaks (as identified by FUMA), and the x-axis of the second panel shows breeds in which the corresponding selective sweeps are observed. The y-axis lists the genes within the respective gene sets and peaks, with the boxes indicating their association with specific traits and selective sweeps in various breeds. Enrichment P values and adjusted P -values (expressed as −log₁₀ values) are shown at the top left to indicate the enrichment of terms (only the top 8 most significant terms are shown). The bar graphs on the left show the total number of breeds in which a putative selective sweep peak that intersected the corresponding gene was observed.
Figure 5:
Figure 5:
(A) Enrichment of high-impact SVs and indels in selective sweep peaks. The y-axis shows the proportion of variants in each category (genome-wide or selective sweep peaks) that are in each impact class (HIGH, MODERATE, and LOW). Due to the disproportionate size of their bars, the MODIFIER class is not shown. Two-sided Fisher exact P values are shown above the bars of the difference between categories of the proportions of variants in the corresponding impact class. (B) Example colocalization of selective sweep peaks observed across populations and metrics at the FIG4 locus. Called peaks are indicated by red (iHS) or blue (nSL) points, with the respective buffalo population indicated above the plot.

Similar articles

References

    1. Pasha T. Comparison between bovine and buffalo milk yield in Pakistan. Ital J Anim Sci. 2007;6(Suppl 2):58–66. 10.4081/ijas.2007.s2.58. - DOI
    1. Zhang Y, Colli L, Barker JSF. Asian water buffalo: domestication, history and genetics. Anim Genet. 2020;51(2):177–91. 10.1111/age.12911. - DOI - PubMed
    1. Borghese A. Situation and perspectives of buffalo in the world.Maced J Anim Sci.2011;1:(2):281–96. 10.54865/mjas112281b. - DOI
    1. Yore K, Gohain C, Tolenkhomba T, et al. Genetic improvement of swamp buffalo through cross breeding and backcrossing with riverine buffalo. Int J Livestock Res. 2018;8(10):30–45. 10.5455/ijlr.20180308100838. - DOI
    1. Pineda PS, Flores EB, Villamor LP, et al. Disentangling river and swamp buffalo genetic diversity: initial insights from the 1000 Buffalo Genomes Project. Gigascience. 2024;13:giae053. 10.1093/gigascience/giae053. - DOI - PMC - PubMed

LinkOut - more resources