Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 6;16(1):8538.
doi: 10.1038/s41467-025-64046-1.

De novo annotation reveals transcriptomic complexity across the hexaploid wheat pan-genome

Collaborators, Affiliations

De novo annotation reveals transcriptomic complexity across the hexaploid wheat pan-genome

Benjamen White et al. Nat Commun. .

Abstract

Wheat is the most widely cultivated crop in the world, with over 215 million hectares grown annually. The 10+ Wheat Genomes Project recently sequenced and assembled to chromosome-level the genomes of nine wheat cultivars, uncovering genetic diversity and selection within the pan-genome of wheat. Here, we provide a wheat pan-transcriptome with de novo annotation and differential expression analysis for these wheat cultivars across multiple tissues. Using the de novo annotations we identify cultivar-specific genes and define the core and dispensable genomes. Expression analysis across cultivars and tissues reveals conservation in expression between a large core set of homeologous genes, in addition to widespread changes in subgenome homeolog expression bias between cultivars and cultivar-specific expression profiles. We utilise both the newly constructed gene-based wheat pan-genome and pan-transcriptome, demonstrating variation in the prolamin superfamily and immune-reactive proteins across cultivars.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Study design, de novo gene annotations and orthologous framework.
A Overview of transcriptome data generated for this study of wheat cultivars. 1 and 2: whole aerial organs sampled at dawn and dusk, 3: root, 4: complete spike at heading (GS59), 5: flag leaf 7 days post anthesis (GS71), 6: whole grains 15 days post anthesis (GS77). B De novo gene prediction results for each cultivar (left side, ‘genes’, separated for A, B and D subgenome) as well as summary of the BUSCO completeness assessment of gene models (right side, ‘BUSCO’). BUSCO genes found in two copies/duplicates are referred to as ‘exact_dupl’ and BUSCO genes found in more than three copies as ‘above_3’. C GENESPACE construction and visualisation of orthologous genes within the wheat cultivars, using de novo predicted gene models. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. The wheat core-, shell- and cloud- genome and homoeologous expression patterns.
A UpSet plot showing intersects of orthogroup conservation between cultivars and the relation to their breeding programmes and sowing season. Locations are at the country/state level as cultivars are representative of national breeding programmes. B A representation of CDC Stanley chromosome 3B showing the positions of Canadian-specific genes (top bar), heatmaps showing coverage scores between genes in CDC Stanley and CDC Landmark (middle bar) and coverage scores between CDC Stanley and Norin 61 (bottom bar). Coverage scores are calculated using kmers from each CDC Stanley gene to search the genome of the other cultivar and range from 0 to 1 with values closer to 1 indicating greater similarity. Regions of greater difference are represented in the heatmaps as darker bands. The plot shows a detailed view of the 0–50 Mb region of chromosome 3B (indicated by a red box). The mean of the coverage score between CDC Stanley genes in this region and genes in the non-Canadian lines is plotted. A cluster of four Canadian-specific genes (marked by a red dashed line) lies in a region which is noticeably different between CDC Stanley and the non-Canadian lines potentially representing an introgression. C Percentage of genes belonging to the core-, shell- and cloud- orthologous groups across cultivars. D Violin plots of core, shell and cloud log2 average gene expression across all combined cultivars and tissues, for each subgenome. Internal box plots display the median (centre line), with boxes representing the 25th to 75th percentiles (interquartile range) and whiskers extending to 1.5× the interquartile range. Outliers are not displayed. Pairwise comparisons between categories (core vs shell vs cloud) were performed using two-sided Dunn’s test for multiple comparisons following a Kruskal–Wallis test. Bonferroni correction was applied to adjust p-values for multiple testing. Exact p-values are shown above each comparison. Higher mean expression was observed in core genes across all subgenomes. E Ternary plots, of stable (left) and dynamic (right) 30-let (definition in main text) expression, where there is an homeolog present on each subgenome, of all tissues in all cultivars, combined, showing more overall balanced expression in stable 30-lets and unbalanced expression in dynamic 30-lets. Source data are either provided in an online repository (10.5281/zenodo.16964999) or as a Source Data file (Fig. 2C).
Fig. 3
Fig. 3. Components of the cultivar-specific networks with functional annotation and cultivar specific differences.
A Hierarchical clustering of 68 module eigengenes from four cultivar networks identifying six metamodules (m1-m6). Each branch corresponds to a separate cultivar network module (ARI: ArinaLrFor, JAG: Jagger, JUL: Julius, NOR: Norin 61). B GO terms of biological processes associated with genes in conserved metamodule one. Only terms with p-adj < 0.05 and >10 significant genes are shown. Bubble colour indicates the −log2 p-value significance from Fisher’s exact test and size indicates the frequency of the GO term in the underlying EBI Gene Ontology Annotation database (larger bubbles indicate more general terms). C Network fragment from Julius module significantly enriched for cloud genes. Labelled nodes refer to cloud genes annotated as histones. The top five highly connected genes for each cloud gene are coloured according to core or shell genome membership. Node size is scaled to the log2 average expression +1 of each gene across tissues and edge width reflects the weight of the connection between nodes. D Expression of two divergent 30-let triads (L: HOG0029794, R: HOG0020263) with similarly divergent subgenome patterns of expression between Jagger and Julius (HOG0029794) and ArinaLrFor and Julius (HOG0020263). Annotated as F-box transcription factor and LRR protein, respectively. Tissues D: dawn, F: flag leaf, G: grain, R: root, S: spike, V: dusk. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Gene and expression variation in the prolamin family across the wheat pan-cultivars.
A Prolamin superfamily gene expression across cultivars; B Coeliac disease epitope expression across cultivars. Epitope expression profiles were calculated as the sum of gene expression profiles with the highlighted HLA-DQ epitopes for each subgenome. C Relative proportion of cumulative expression profiles of transcription factor families showing strong co-expression patterns (Pearson correlation values > 0.8) with the epitope-coding prolamin genes. Results show significant differences in the NAC, AP2/EREBP and MYB transcription factor gene expressions, major storage protein gene expression regulators. D Representation of the variation graph for the region of 6D containing the alpha-gliadin locus (Supplementary Fig. 11B). Horizontal coloured lines depict paths through the graph for each cultivar; Norin 61 (6D: 26,703,647-27,222,360 bp), CDC Stanley (6D: 28,164,601-28,660,350 bp) and Mace (6D: 26,808,846-27,298,593 bp), with SY Mattis (6D: 26,645,382-27,096,594 bp) and Julius (6D: 26,983,100-27,437,565 bp) sharing a single path. Rectangular blocks (a-p) represent individual genes at corresponding locations across cultivars (green: in common to all four cultivars, blue: occurring in one cultivar and purple: occurring in two cultivars). Gene d is present as a single copy in Norin 61, and duplicated in CDC Stanley, SY Mattis, Julius and Mace. This duplication is represented as a loop in the path through the graph for these cultivars (Supplementary Fig. 12). Source data are either provided as a Source Data file or in an online repository (Fig. 4D; 10.5281/zenodo.16964999).

References

    1. Braun, H.-J., Rajaram, S. & Ginkel, M. van. CIMMYT’s approach to breeding for wide adaptation. Euphytica92, 175–183 (1996).
    1. Erenstein, O. et al. Wheat Improvement, Food Security in a Changing Climate 47–66. 10.1007/978-3-030-90673-3_4 (2022).
    1. Levy, A. A. & Feldman, M. Evolution and origin of bread wheat. Plant Cell34, 2549–2567 (2022). - PMC - PubMed
    1. Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature588, 277–283 (2020). - PMC - PubMed
    1. Bayer, P. E. et al. Wheat Panache: a pangenome graph database representing presence–absence variation across sixteen bread wheat genomes. Plant Genome15, e20221 (2022). - PubMed

LinkOut - more resources