Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 20;34(2):300-309.
doi: 10.1101/gr.278267.123.

Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle

Affiliations

Pangenome-genotyped structural variation improves molecular phenotype mapping in cattle

Alexander S Leonard et al. Genome Res. .

Abstract

Expression and splicing quantitative trait loci (e/sQTL) are large contributors to phenotypic variability. Achieving sufficient statistical power for e/sQTL mapping requires large cohorts with both genotypes and molecular phenotypes, and so, the genomic variation is often called from short-read alignments, which are unable to comprehensively resolve structural variation. Here we build a pangenome from 16 HiFi haplotype-resolved cattle assemblies to identify small and structural variation and genotype them with PanGenie in 307 short-read samples. We find high (>90%) concordance of PanGenie-genotyped and DeepVariant-called small variation and confidently genotype close to 21 million small and 43,000 structural variants in the larger population. We validate 85% of these structural variants (with MAF > 0.1) directly with a subset of 25 short-read samples that also have medium coverage HiFi reads. We then conduct e/sQTL mapping with this comprehensive variant set in a subset of 117 cattle that have testis transcriptome data, and find 92 structural variants as causal candidates for eQTL and 73 for sQTL. We find that roughly half of the top associated structural variants affecting expression or splicing are transposable elements, such as SV-eQTL for STN1 and MYH7 and SV-sQTL for CEP89 and ASAH2 Extensive linkage disequilibrium between small and structural variation results in only 28 additional eQTL and 17 sQTL discovered when including SVs, although many top associated SVs are compelling candidates.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Concordance of variants genotyped by PanGenie. (A) SV overlap between PanGenie and Sniffles for the eight individuals used to create the pangenome variant panel. (B) SV size distribution for the groups in A. The gray dashed lines indicate 15 kb, the average read length for the HiFi reads used by Sniffles. (C) Small variant overlap between PanGenie-genotyped variants and DeepVariant-called variants for the 307 short-read samples. (D) Precision and recall for the 307 samples from C. The gray lines are the F-score boundaries for the indicated values. (E) Fraction of all SVs tagged by small variants at different thresholds of r2 within a linkage window of 1000 kb across the 307 samples. (F) Average and median number of variants that tag each SV across different r2 thresholds.
Figure 2.
Figure 2.
Comparison of variant calling with a small long-read cohort. (A) SV intersection between PanGenie (called from eight individuals with haplotype-resolved assemblies) and Sniffles (called from 25 HiFi read samples). (B) SV saturation for 25 HiFi read samples. Markers indicate the mean value of unique SVs over 10 random shuffles of sample order, and error bars represent the standard deviation. The dotted line is a fitted curve of the form f(x) = axb + c, predicting saturation at approximately 175,000 SVs. (C) SV overlap for different allele frequency (based on the 25 samples) bins. (D) Small variant accuracy of HiFi-based and short-read-based calls, taking the short-read data as truth, stratified by autosomes and sex chromosomes for SNPs and indels. Large markers indicate the mean over the 25 samples. (E) Small variant intersections between HiFi-based and short-read-based calls in genomic regions identified as centromeric satellites, low mappability, tandem repeats, repetitive, and “normal” (all other regions). A large proportion of variants called in the challenging regions were unique to HiFi-based alignment and calling.
Figure 3.
Figure 3.
cis-QTL mapping. (A) Twenty-five independent eGene signals with red diamonds denoting SVs as uniquely top hits. Other SVs are shown as yellow diamonds, and small variations are shown as teal circles. (B) Fifty-eight independent eGene signals with SVs as top hits in LD, with small variants denoted as orange diamonds and with yellow diamonds and teal circles as described in A. (C) eGenes that are present in only the PanGenie+ data set or the short-read-only DeepVariant data set. The dashed line indicates equal significance thresholds between the two conditional analyses.
Figure 4.
Figure 4.
Nominal eQTL association significance (left) and normalized TPM values for the expressed gene (right) for STN1 (A) and CEP15 (B). The red diamond represents the top-associated SV. Linkage disequilibrium (LD) between the SV and all other variants within the cis-window is indicated with the color gradient.
Figure 5.
Figure 5.
cis-sQTL mapping. (A) Fifteen independent sGene cluster signals with SVs as the unique-top variant and (B) 58 SVs as top variants in LD with small variants, with the color and marker meanings as described in Figure 4. (C) Nominal association significance for CEP89, where the two red diamonds indicate the same variant affecting two separate junction splicings within the sQTL cluster. (D) Percentage spliced in (PSI) across the two significantly associated junctions (indicated by number from C) within the sQTL cluster.

Similar articles

Cited by

References

    1. Almeida LM, Silva IT, Silva WA, Castro JP, Riggs PK, Carareto CM, Amaral MEJ. 2007. The contribution of transposable elements to Bos taurus gene structure. Gene 390: 180–189. 10.1016/j.gene.2006.10.012 - DOI - PubMed
    1. Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, Suresh H, Ramakrishnan S, Maumus F, Ciren D, et al. 2020. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182: 145–161.e23. 10.1016/j.cell.2020.05.021 - DOI - PMC - PubMed
    1. Alonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, Wang X, Lippman ZB, Schatz MC, Soyk S. 2022. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol 23: 258. 10.1186/s13059-022-02823-7 - DOI - PMC - PubMed
    1. Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, Atlason BA, Kristmundsdottir S, Mehringer S, Hardarson MT, et al. 2021. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet 53: 779–786. 10.1038/s41588-021-00865-4 - DOI - PubMed
    1. Bhati M, Mapel XM, Lloret-Villas A, Pausch H. 2023. Structural variants and short tandem repeats impact gene expression and splicing in bovine testis tissue. Genetics 225: 2023.06.07.543773. 10.1093/genetics/iyad161 - DOI - PMC - PubMed

Publication types

LinkOut - more resources