Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May;57(5):1287-1297.
doi: 10.1038/s41588-025-02183-5. Epub 2025 Apr 28.

Oryza genome evolution through a tetraploid lens

Affiliations

Oryza genome evolution through a tetraploid lens

Alice Fornasiero et al. Nat Genet. 2025 May.

Abstract

Oryza is a remarkable genus comprising 27 species and 11 genome types, with ~3.4-fold genome size variation, that possesses a virtually untapped reservoir of genes that can be used for crop improvement and neodomestication. Here we present 11 chromosome-level assemblies (nine tetraploid, two diploid) in the context of ~15 million years of evolution and show that the core Oryza (sub)genome is only ~200 Mb and largely syntenic, whereas the remaining nuclear fractions (~80-600 Mb) are intermingled, plastic and rapidly evolving. For the halophyte Oryza coarctata, we found that despite detection of gene fractionation in the subgenomes, homoeologous genes were expressed at higher levels in one subgenome over the other in a mosaic form, demonstrating subgenome equivalence. The integration of these 11 new reference genomes with previously published genome datasets provides a nearly complete view of the consequences of evolution for genome diversification across the genus.

PubMed Disclaimer

Conflict of interest statement

Competing interests: V.L. and P.P. are affiliated with Corteva Agriscience. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the syntenic landscape and large-scale structural rearrangements of 12 Oryza species (21 (sub)genomes) with the outgroup L. perrieri.
Riparian plot showing macro-syntenic regions and large-scale structural rearrangements (large duplications and translocations) across the chromosomes of 12 Oryza species (21 (sub)genomes) and outgroup species L. perrieri. Genome types are shown according to the phylogenetic order in the genus, from the top (O. sativa (AA)) to the bottom (O. meyeriana (GG)). Each chromosome is colored as follows: Chr1, orange; Chr2, beige; Chr3, celeste; Chr4, steel blue; Chr5, navy blue; Chr6, deep purple; Chr7, plum; Chr8, magenta; Chr9, raspberry; Chr10, ruby; Chr11, coral; Chr12, salmon. Chromosomes are scaled by assembly length.
Fig. 2
Fig. 2. (Sub)genome size and TE content of the wild Oryza species.
a, Correlation between (sub)genome size (Mb) and TE content (Mb) in the nine tetraploid and two diploid wild Oryza species. The significance of the linear correlation (Pearson’s correlation coefficient, R2) was ascertained by two-sided t-test. b, Abundance of the main classes of TEs (Mb). DNA transposons are shown as follows: hAT (DTA), pink; CACTA (DTC), red; Harbinger (DTH), orange; Mutator (DTM), yellow; Mariner (DTT), ochre. LTR retrotransposons are shown as: LTR Copia, dark blue; LTR Gypsy, light blue; LTR unknown, steel blue. Unspecified TEs are shown in white; non-TE content is shown in gray. (Sub)genomes of the species are ordered by genome type (BB, CC, DD, EE, KK, LL, HH, JJ, GG).
Fig. 3
Fig. 3. The syntenic pangenome.
a, Phylogenomic profiling of clusters of syntenic genes across 30 Oryza (sub)genomes (the nine tetraploid and two diploid species presented here and ten additional diploid species listed in Supplementary Table 8) and outgroup species L. perrieri. In the heat map, each row represents a (sub)genome, and each column shows a syntenic cluster (that is, a grouping of syntenic homologous genes across two or more Oryza species and/or between Oryza and L. perrieri). The minimum number of genes in a cluster is two homologous genes in two different species or genera. Clustering of the Oryza (sub)genomes was based on presence and absence patterns of syntenic clusters using Euclidean distance and is shown as a dendrogram on top of the figure. For each syntenic cluster, gene copy-number variation is represented as follows: gene absence, light gray; one gene copy, blue; two gene copies, yellow; three or more gene copies, red. On the left side, vertical bars represent the genome type of the (sub)genomes: AA, dark blue; BB, red; CC, dark green; DD, purple; EE, black; KK, sky blue; LL, orange; HH, light green; JJ, lilac; FF, light gray; GG, light blue; O (outgroup), peach. b, Histogram showing the frequency distribution of syntenic clusters in the 30 (sub)genomes shown in a and shared by increasing numbers of Oryza (sub)genomes (x axis). The legend shows the percentage of core (found in all 30 subgenomes), softcore (found in 27–29 subgenomes), dispensable (found in 2–26 subgenomes) and private syntenic clusters. c, Percentages of genes classified in the different syntenic cluster categories (core (blue), softcore (green) and dispensable (yellow)) in the 30 Oryza (sub)genomes. Percentages of genes classified in private syntenic clusters are not shown.
Fig. 4
Fig. 4. Phylogenetic analysis of the Oryza (sub)genomes.
a, Phylogenetic tree based on chloroplast genome sequences. IQ-TREE was used to reconstruct a maximum likelihood phylogeny using the large-single-copy regions of the chloroplast genomes of 26 Oryza species (the chloroplast genomes of the ten species presented here and 16 additional chloroplast genomes of diploid species; Supplementary Table 11). Supporting values next to each branch are SH-aLRT (Shimodaira–Hasegawa-like approximate likelihood ratio) support (%)/ultrafast bootstrap support (%). 100%/100% support values are not shown. Branch length indicates substitutions per site. Trees were rooted using L. japonica as outgroup. b, Time-dated phylogenetic tree based on nuclear gene sequences. The phylogeny was inferred using the maximum likelihood method with a concatenated alignment of 528 single-copy genes. Phylogenetic dating was obtained using the molecular calibration for the crown age of Oryza (14.5 Ma) and the divergence of CC and AA-BB (6 Ma),. c, Ks value distribution plot for HH, JJ, KK and LL genome types (O. ridleyi JJ versus O. longiglumis JJ, closed purple circle; O. ridleyi HH versus O. longiglumis HH, closed green square; O. schlechteri LL[HH] versus O. coarctata LL, closed orange triangle; O. schlechteri KK versus O. coarctata KK, open blue diamond; O. schlechteri LL[HH] versus O. longiglumis HH, green closed diamond; O. schlechteri LL[HH] versus O. ridleyi HH, open green square). The genome types used in the phylogenetic trees refer to definitions based on cytogenetic and hybridization experiments and the molecular evidence provided here for the renaming of O. schlechteri from HHKK to KKLL genome type.
Fig. 5
Fig. 5. Consensus tree summarizing origins and evolutionary history of diploid and tetraploid Oryza species.
Single-line branches denote diploid species, whereas double-line branches denote tetraploid species. Single dashed lines represent unknown diploid wild relative species. Forward slash (/) indicates that the species names—O. meyeriana and O. granulata—are considered synonyms (https://powo.science.kew.org). One asterisk (*) denotes that the maternal donor is a BB genome species. Two asterisks (**) denote that the maternal donor is a CC genome species. Genome types and known representative species are shown next to the terminal nodes. The relative times of hybridization events are based on the current study. The tree includes the new designation of O. schlechteri as KKLL genome type (the same as O. coarctata) proposed in this work. L. perrieri and L. japonica (here collectively referred to as Leersia) are the outgroups.
Fig. 6
Fig. 6. Homoeologous gene retention in Oryza and subgenome equivalence in O. coarctata.
a, Distribution of gene retention (percentage, y axis) in the subgenomes of the tetraploid species (x axis). Each genome type is colored as in Figs. 3a and 4b. The red dashed line indicates the average percentage of gene retention calculated genome-wide for each species. P values from two-sided Wilcoxon rank-sum tests and numbers of sliding windows (n) are shown. b, Transcript abundance of homoeologous genes in O. coarctata and their homologs in O. sativa. Gene expression as log2 (TPM + 1) was measured in the leaf and in the root considering the replicates together. P values from two-sided Wilcoxon rank-sum tests are shown. In a and b, the 50th percentiles are defined by middle lines; lower and upper hinges correspond to the 25th and 75th percentiles. The upper whisker extends from the hinge to the largest value at most 1.5 times the interquartile range from the hinge. The lower whisker extends from the hinge to the smallest value at most 1.5 times the interquartile range of the hinge. Data beyond the end of the whiskers were considered to be outliers and plotted as individual points. c, Homoeologous gene pair expression bias (B) in the leaf (left) and the root (right) of O. coarctata. Blue and orange bars represent the expression of homoeologs biased toward KK (B < −1) and LL (B > 1) subgenomes, respectively. Homoeolog pairs with −1 ≤ B ≤ 1 (gray bars) are defined as nondominantly expressed. N represents the number of homoeologous gene pairs in the three categories (NKK, homoeologous gene dominantly expressed in KK subgenome; NLL, homoeologous gene dominantly expressed in LL subgenome; Nnonbiased, homoeologous gene not dominantly expressed). BKK, BLL and Bnonbiased represent average expression bias for the homoeologous pairs in the respective categories. nf, nonfractionated (homoeologous gene pairs); f, fractionated genes; Os-ph, O. sativa genes homologous to nonfractionated O. coarctata genes (paired homologous); Os-sch, O. sativa genes homologous to fractionated O. coarctata genes (single-copy homologous).
Extended Data Fig. 1
Extended Data Fig. 1. Genome type-level pangenomes for the AA, BB, CC, and DD types.
a) A growth histogram relative to the cumulative number of (sub)genomes on the x-axis is shown for the AA, BB, CC, and DD pangenomes, respectively. Sequences shared by all (sub)genomes represent the core pangenome (green), partially shared sequences represent the dispensable pangenome (ochre). b) Pangenome visualization of the translocation on chromosome 1 C in O. alta and O. grandiglumis. The red arrow points to the chromosomal region corresponding to an unbalanced translocation of a portion of Chr3 on Chr1 (see also panel c). This chromosomal rearrangement is present in O. alta and O. grandiglumis (red box) and absent from the other CC types used to build the CC pangenome (that is O. officinalis, O. minuta, O. malampuzhaensis and O. latifolia). c) The riparian plot shows the synteny between O. sativa (AA), O. alta (CC), O. grandiglumis (CC), O. latifolia (CC) and L. perrieri chromosomal regions. The red box highlights the duplication and translocation of a portion of Chr3 on Chr1 in O. alta and O. grandiglumis.

References

    1. Tanksley, S. D. & McCouch, S. R. Seed banks and molecular maps: unlocking genetic potential from the wild. Science277, 1063–1066 (1997). - PubMed
    1. Ross-Ibarra, J., Morrell, P. L. & Gaut, B. S. Plant domestication, a unique opportunity to identify the genetic basis of adaptation. Proc. Natl Acad. Sci. USA104, 8641–8648 (2007). - PMC - PubMed
    1. Purugganan, M. D. An evolutionary genomic tale of two rice species. Nat. Genet.46, 931–932 (2014). - PubMed
    1. World Population Prospects: The 2017 Revision (United Nations, 2017); https://www.un.org/development/desa/pd/sites/www.un.org.development.desa...
    1. Nayar, N. M. Origin and cytogenetics of rice. Adv. Genet.17, 153–292 (1973).

LinkOut - more resources