. 2025 Oct 6;16(1):8538.

doi: 10.1038/s41467-025-64046-1.

De novo annotation reveals transcriptomic complexity across the hexaploid wheat pan-genome

Benjamen White^#¹, Thomas Lux^#², Rachel Rusholme-Pilcher^#¹, Angéla Juhász^#³, Gemy Kaithakottil¹, Susan Duncan^{1

4}, James Simmonds⁴, Hannah Rees¹, Jonathan Wright¹, Joshua Colmer¹, Sabrina Ward¹, Ryan Joynson^{1

5}, Benedict Coombes¹, Naomi Irish¹, Suzanne Henderson¹, Tom Barker¹, Helen Chapman¹, Leah Catchpole¹, Karim Gharbi¹, Utpal Bose^{3

6}, Moeko Okada^{7

8

9}, Hirokazu Handa¹⁰, Shuhei Nasuda¹¹, Kentaro K Shimizu^{7

8}, Heidrun Gundlach², Daniel Lang^{2

12}, Guy Naamati¹³, Erik J Legg¹⁴, Arvind K Bharti¹⁴, Michelle L Colgrave^{3

6}, Wilfried Haerty¹, Cristobal Uauy⁴, David Swarbreck¹, Philippa Borrill⁴, Jesse A Poland¹⁵, Simon G Krattinger¹⁵, Nils Stein^{16

17}, Klaus F X Mayer^{2

18}, Curtis Pozniak¹⁹; 10+ Wheat Genome Project; Manuel Spannagl^{20

21}, Anthony Hall^{22

23}

Collaborators, Affiliations

Collaborators

10+ Wheat Genome Project:
Sean Walkowiak, Valentyna Klymiuk, Brook Byrns, Kirby Nilsen, Jennifer Ens, Krystalee Wiebe, Amidou N'Diaye, Pierre J Hucl, Curtis J Pozniak, Bin Xiao Fu, Liangliang Gao, Emily Delorean, Dal-Hoe Koo, Allen K Fritz, Jesse Poland, Cecile Monat, Axel Himmelbach, Anne Fiebig, Sudharsan Padmarasu, Uwe Scholz, Martin Mascher, Georg Haberer, Mulualem T Kassa, Pierre Fobert, Sateesh Kagale, Jemima Brinton, Ricardo H Ramirez-Gonzalez, Michael Bevan, Neil McKenzie, Burkhard Steuernagel, Markus C Kolodziej, Simon G Krattinger, Beat Keller, Thomas Wicker, Dinushika Thambugala, Curt A McCartney, Venkat Bandi, Jorge Nunez Siri, Carl Gutwin, Catharine Aquino, Masaomi Hatakeyama, Dario Copetti, Gwyneth Halstead-Nussloch, Timothy Paape, Rie Shimizu-Inatsugi, Kentaro K Shimizu, Tomohiro Ban, Kanako Kawaura, Toshiaki Tameshige, Hiroyuki Tsuji, Luca Venturini, Matthew Clark, Bernardo Clavijo, Christine Fosker, Gonzalo Garcia Accinelli, Darren Heavens, Ksenia Krasileva, Keith A Gardner, Nick Fradgley, Lawrence Percival-Alwyn, James Cockram, Juan Gutierrez-Gonzalez, Gary Muehlbauer, Chu Shin Koh, Andrew G Sharpe, Jasline Deek, Alejandro C Costamagna, Hiroyuki Kanamori, Fuminori Kobayashi, Tsuyoshi Tanaka, Jianzhong Wu, Hirokazu Handa, Tony Kuo, Jun Sese, Kazuki Murata, Yusuke Nabeka, Shuhei Nasuda, Philomin Juliana, Ravi Singh, Hikmet Budak, Ian Small, Joanna Melonek, Sylvie Cloutier, Gabriel Keeble-Gagnère, Josquin Tibbets, Erik Legg, Arvind Bharti, Peter Langridge, Ken Chalmers, Assaf Distelfeld

Affiliations

¹ Earlham Institute, Norwich Research Park, Norwich, UK.
² PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, Neuherberg, Germany.
³ Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, School of Science, Edith Cowan University, Joondalup, WA, Australia.
⁴ John Innes Centre, Norwich Research Park, Norwich, UK.
⁵ Limagrain Europe, Clermont-Ferrand, Auvergne-Rhône-Alpes, France.
⁶ CSIRO Agriculture and Food, St Lucia, QLD, Australia.
⁷ Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
⁸ Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan.
⁹ Graduate School of Science and Technology, Niigata University, Niigata, Japan.
¹⁰ Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Kyoto, Japan.
¹¹ Graduate School of Agriculture, Kyoto University, Kyoto, Japan.
¹² Bundeswehr Institute of Microbiology, Munich, Germany.
¹³ EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.
¹⁴ Syngenta Crop Protection, Research Triangle Park, Durham, NC, USA.
¹⁵ Plant Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
¹⁶ Crop Plant Genetics, Institute of Agricultural and Nutritional Sciences, Martin Luther University of Halle-Wittenberg, Halle (Saale), Germany.
¹⁷ Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
¹⁸ School of Life Sciences, Technical University Munich, Freising, Germany.
¹⁹ Crop Development Centre, The University of Saskatchewan, Saskatoon, SK, Canada.
²⁰ PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, Neuherberg, Germany. manuel.spannagl@helmholtz-muenchen.de.
²¹ Centre for Crop & Food Innovation, Food Futures Institute, Murdoch University, Murdoch, WA, Australia. manuel.spannagl@helmholtz-muenchen.de.
²² Earlham Institute, Norwich Research Park, Norwich, UK. anthony.hall@earlham.ac.uk.
²³ School of Biological Sciences, University of East Anglia, Norwich, UK. anthony.hall@earlham.ac.uk.

^# Contributed equally.

PMID: 41053037
PMCID: PMC12501010
DOI: 10.1038/s41467-025-64046-1

De novo annotation reveals transcriptomic complexity across the hexaploid wheat pan-genome

Benjamen White et al. Nat Commun. 2025.

. 2025 Oct 6;16(1):8538.

doi: 10.1038/s41467-025-64046-1.

Authors

Collaborators

10+ Wheat Genome Project:
Sean Walkowiak, Valentyna Klymiuk, Brook Byrns, Kirby Nilsen, Jennifer Ens, Krystalee Wiebe, Amidou N'Diaye, Pierre J Hucl, Curtis J Pozniak, Bin Xiao Fu, Liangliang Gao, Emily Delorean, Dal-Hoe Koo, Allen K Fritz, Jesse Poland, Cecile Monat, Axel Himmelbach, Anne Fiebig, Sudharsan Padmarasu, Uwe Scholz, Martin Mascher, Georg Haberer, Mulualem T Kassa, Pierre Fobert, Sateesh Kagale, Jemima Brinton, Ricardo H Ramirez-Gonzalez, Michael Bevan, Neil McKenzie, Burkhard Steuernagel, Markus C Kolodziej, Simon G Krattinger, Beat Keller, Thomas Wicker, Dinushika Thambugala, Curt A McCartney, Venkat Bandi, Jorge Nunez Siri, Carl Gutwin, Catharine Aquino, Masaomi Hatakeyama, Dario Copetti, Gwyneth Halstead-Nussloch, Timothy Paape, Rie Shimizu-Inatsugi, Kentaro K Shimizu, Tomohiro Ban, Kanako Kawaura, Toshiaki Tameshige, Hiroyuki Tsuji, Luca Venturini, Matthew Clark, Bernardo Clavijo, Christine Fosker, Gonzalo Garcia Accinelli, Darren Heavens, Ksenia Krasileva, Keith A Gardner, Nick Fradgley, Lawrence Percival-Alwyn, James Cockram, Juan Gutierrez-Gonzalez, Gary Muehlbauer, Chu Shin Koh, Andrew G Sharpe, Jasline Deek, Alejandro C Costamagna, Hiroyuki Kanamori, Fuminori Kobayashi, Tsuyoshi Tanaka, Jianzhong Wu, Hirokazu Handa, Tony Kuo, Jun Sese, Kazuki Murata, Yusuke Nabeka, Shuhei Nasuda, Philomin Juliana, Ravi Singh, Hikmet Budak, Ian Small, Joanna Melonek, Sylvie Cloutier, Gabriel Keeble-Gagnère, Josquin Tibbets, Erik Legg, Arvind Bharti, Peter Langridge, Ken Chalmers, Assaf Distelfeld

Affiliations

¹ Earlham Institute, Norwich Research Park, Norwich, UK.
² PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, Neuherberg, Germany.
³ Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, School of Science, Edith Cowan University, Joondalup, WA, Australia.
⁴ John Innes Centre, Norwich Research Park, Norwich, UK.
⁵ Limagrain Europe, Clermont-Ferrand, Auvergne-Rhône-Alpes, France.
⁶ CSIRO Agriculture and Food, St Lucia, QLD, Australia.
⁷ Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland.
⁸ Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan.
⁹ Graduate School of Science and Technology, Niigata University, Niigata, Japan.
¹⁰ Graduate School of Life and Environmental Sciences, Kyoto Prefectural University, Kyoto, Japan.
¹¹ Graduate School of Agriculture, Kyoto University, Kyoto, Japan.
¹² Bundeswehr Institute of Microbiology, Munich, Germany.
¹³ EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, UK.
¹⁴ Syngenta Crop Protection, Research Triangle Park, Durham, NC, USA.
¹⁵ Plant Science Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
¹⁶ Crop Plant Genetics, Institute of Agricultural and Nutritional Sciences, Martin Luther University of Halle-Wittenberg, Halle (Saale), Germany.
¹⁷ Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany.
¹⁸ School of Life Sciences, Technical University Munich, Freising, Germany.
¹⁹ Crop Development Centre, The University of Saskatchewan, Saskatoon, SK, Canada.
²⁰ PGSB Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, Neuherberg, Germany. manuel.spannagl@helmholtz-muenchen.de.
²¹ Centre for Crop & Food Innovation, Food Futures Institute, Murdoch University, Murdoch, WA, Australia. manuel.spannagl@helmholtz-muenchen.de.
²² Earlham Institute, Norwich Research Park, Norwich, UK. anthony.hall@earlham.ac.uk.
²³ School of Biological Sciences, University of East Anglia, Norwich, UK. anthony.hall@earlham.ac.uk.

^# Contributed equally.

PMID: 41053037
PMCID: PMC12501010
DOI: 10.1038/s41467-025-64046-1

Abstract

Wheat is the most widely cultivated crop in the world, with over 215 million hectares grown annually. The 10+ Wheat Genomes Project recently sequenced and assembled to chromosome-level the genomes of nine wheat cultivars, uncovering genetic diversity and selection within the pan-genome of wheat. Here, we provide a wheat pan-transcriptome with de novo annotation and differential expression analysis for these wheat cultivars across multiple tissues. Using the de novo annotations we identify cultivar-specific genes and define the core and dispensable genomes. Expression analysis across cultivars and tissues reveals conservation in expression between a large core set of homeologous genes, in addition to widespread changes in subgenome homeolog expression bias between cultivars and cultivar-specific expression profiles. We utilise both the newly constructed gene-based wheat pan-genome and pan-transcriptome, demonstrating variation in the prolamin superfamily and immune-reactive proteins across cultivars.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

**Fig. 1. Study design, de novo gene annotations and orthologous framework.**
A Overview of transcriptome data generated for this study of wheat cultivars. 1 and 2: whole aerial organs sampled at dawn and dusk, 3: root, 4: complete spike at heading (GS59), 5: flag leaf 7 days post anthesis (GS71), 6: whole grains 15 days post anthesis (GS77). B De novo gene prediction results for each cultivar (left side, ‘genes’, separated for A, B and D subgenome) as well as summary of the BUSCO completeness assessment of gene models (right side, ‘BUSCO’). BUSCO genes found in two copies/duplicates are referred to as ‘exact_dupl’ and BUSCO genes found in more than three copies as ‘above_3’. C GENESPACE construction and visualisation of orthologous genes within the wheat cultivars, using de novo predicted gene models. Source data are provided as a Source Data file.

**Fig. 2. The wheat core-, shell- and cloud- genome and homoeologous expression patterns.**
A UpSet plot showing intersects of orthogroup conservation between cultivars and the relation to their breeding programmes and sowing season. Locations are at the country/state level as cultivars are representative of national breeding programmes. B A representation of CDC Stanley chromosome 3B showing the positions of Canadian-specific genes (top bar), heatmaps showing coverage scores between genes in CDC Stanley and CDC Landmark (middle bar) and coverage scores between CDC Stanley and Norin 61 (bottom bar). Coverage scores are calculated using kmers from each CDC Stanley gene to search the genome of the other cultivar and range from 0 to 1 with values closer to 1 indicating greater similarity. Regions of greater difference are represented in the heatmaps as darker bands. The plot shows a detailed view of the 0–50 Mb region of chromosome 3B (indicated by a red box). The mean of the coverage score between CDC Stanley genes in this region and genes in the non-Canadian lines is plotted. A cluster of four Canadian-specific genes (marked by a red dashed line) lies in a region which is noticeably different between CDC Stanley and the non-Canadian lines potentially representing an introgression. C Percentage of genes belonging to the core-, shell- and cloud- orthologous groups across cultivars. D Violin plots of core, shell and cloud log₂ average gene expression across all combined cultivars and tissues, for each subgenome. Internal box plots display the median (centre line), with boxes representing the 25th to 75th percentiles (interquartile range) and whiskers extending to 1.5× the interquartile range. Outliers are not displayed. Pairwise comparisons between categories (core vs shell vs cloud) were performed using two-sided Dunn’s test for multiple comparisons following a Kruskal–Wallis test. Bonferroni correction was applied to adjust p-values for multiple testing. Exact p-values are shown above each comparison. Higher mean expression was observed in core genes across all subgenomes. E Ternary plots, of stable (left) and dynamic (right) 30-let (definition in main text) expression, where there is an homeolog present on each subgenome, of all tissues in all cultivars, combined, showing more overall balanced expression in stable 30-lets and unbalanced expression in dynamic 30-lets. Source data are either provided in an online repository (10.5281/zenodo.16964999) or as a Source Data file (Fig. 2C).

**Fig. 3. Components of the cultivar-specific networks with functional annotation and cultivar specific differences.**
A Hierarchical clustering of 68 module eigengenes from four cultivar networks identifying six metamodules (m1-m6). Each branch corresponds to a separate cultivar network module (ARI: ArinaLrFor, JAG: Jagger, JUL: Julius, NOR: Norin 61). B GO terms of biological processes associated with genes in conserved metamodule one. Only terms with p-adj < 0.05 and >10 significant genes are shown. Bubble colour indicates the −log2 p-value significance from Fisher’s exact test and size indicates the frequency of the GO term in the underlying EBI Gene Ontology Annotation database (larger bubbles indicate more general terms). C Network fragment from Julius module significantly enriched for cloud genes. Labelled nodes refer to cloud genes annotated as histones. The top five highly connected genes for each cloud gene are coloured according to core or shell genome membership. Node size is scaled to the log2 average expression +1 of each gene across tissues and edge width reflects the weight of the connection between nodes. D Expression of two divergent 30-let triads (L: HOG0029794, R: HOG0020263) with similarly divergent subgenome patterns of expression between Jagger and Julius (HOG0029794) and ArinaLrFor and Julius (HOG0020263). Annotated as F-box transcription factor and LRR protein, respectively. Tissues D: dawn, F: flag leaf, G: grain, R: root, S: spike, V: dusk. Source data are provided as a Source Data file.

**Fig. 4. Gene and expression variation in the prolamin family across the wheat pan-cultivars.**
A Prolamin superfamily gene expression across cultivars; B Coeliac disease epitope expression across cultivars. Epitope expression profiles were calculated as the sum of gene expression profiles with the highlighted HLA-DQ epitopes for each subgenome. C Relative proportion of cumulative expression profiles of transcription factor families showing strong co-expression patterns (Pearson correlation values > 0.8) with the epitope-coding prolamin genes. Results show significant differences in the NAC, AP2/EREBP and MYB transcription factor gene expressions, major storage protein gene expression regulators. D Representation of the variation graph for the region of 6D containing the alpha-gliadin locus (Supplementary Fig. 11B). Horizontal coloured lines depict paths through the graph for each cultivar; Norin 61 (6D: 26,703,647-27,222,360 bp), CDC Stanley (6D: 28,164,601-28,660,350 bp) and Mace (6D: 26,808,846-27,298,593 bp), with SY Mattis (6D: 26,645,382-27,096,594 bp) and Julius (6D: 26,983,100-27,437,565 bp) sharing a single path. Rectangular blocks (a-p) represent individual genes at corresponding locations across cultivars (green: in common to all four cultivars, blue: occurring in one cultivar and purple: occurring in two cultivars). Gene d is present as a single copy in Norin 61, and duplicated in CDC Stanley, SY Mattis, Julius and Mace. This duplication is represented as a loop in the path through the graph for these cultivars (Supplementary Fig. 12). Source data are either provided as a Source Data file or in an online repository (Fig. 4D; 10.5281/zenodo.16964999).

See this image and copyright information in PMC

References

1. Braun, H.-J., Rajaram, S. & Ginkel, M. van. CIMMYT’s approach to breeding for wide adaptation. Euphytica92, 175–183 (1996).
1. Erenstein, O. et al. Wheat Improvement, Food Security in a Changing Climate 47–66. 10.1007/978-3-030-90673-3_4 (2022).
1. Levy, A. A. & Feldman, M. Evolution and origin of bread wheat. Plant Cell34, 2549–2567 (2022). - PMC - PubMed
1. Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature588, 277–283 (2020). - PMC - PubMed
1. Bayer, P. E. et al. Wheat Panache: a pangenome graph database representing presence–absence variation across sixteen bread wheat genomes. Plant Genome15, e20221 (2022). - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

De novo annotation reveals transcriptomic complexity across the hexaploid wheat pan-genome

Collaborators

Affiliations

De novo annotation reveals transcriptomic complexity across the hexaploid wheat pan-genome

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources