Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 18;15(1):9994.
doi: 10.1038/s41467-024-54427-3.

Genome evolution and diversity of wild and cultivated rice species

Affiliations

Genome evolution and diversity of wild and cultivated rice species

Weixiong Long et al. Nat Commun. .

Abstract

Wild species of crops serve as a valuable germplasm resource for breeding of modern cultivars. Rice (Oryza sativa L.) is a vital global staple food. However, research on genome evolution and diversity of wild rice species remains limited. Here, we present nearly complete genomes of 13 representative wild rice species. By integrating with four previously published genomes for pangenome analysis, a total of 101,723 gene families are identified across the genus, including 9834 (9.67%) core gene families. Additionally, 63,881 gene families absent in cultivated rice species but present in wild rice species are discovered. Extensive structural rearrangements, sub-genomes exchanges, widespread allelic variations, and regulatory sequence variations are observed in wild rice species. Interestingly, expanded but less diverse disease resistance genes in the genomes of cultivated rice, likely due to the loss of some resistance genes and the fixing and amplification of genes encoding resistance genes to specific diseases during domestication and artificial selection. This study not only reveals natural variations valuable for gene-level studies and breeding selection but also enhances our understanding on rice evolution and domestication.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Geographical distribution and phylogeny of wild and cultivated rice.
a Geographic distribution of the 13 wild rice varieties and their diverse agronomic characteristics, such as plant height. The font colors of wild rice species correspond to the distribution on the world map. These 13 accessions covered 13 species in the rice genus. b Panicle architecture of the 13 wild rice species. c Phylogenetic tree based on the conserved single-copy gene illustrates evolutionary history in the Oryza genus. All the allotetraploid wild rice genome was split into subgenomes, the ASTRAL concatenation-based species tree estimated by 3555 single copy genes generated by Orthofinder. Numbers at nodes indicate the median value for the divergence time (Mya) estimates for each clade. d TE content in each subgenome/genome of the wild and cultivated rice. RNA transposons were presented as follows: blue for LTR, yellow represents Helitron, pink indicates LINEs, orange shows SINEs, and orange, blue stands for unclassified. DNA transposons were shown as wine. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Pangenome of rice genus.
a Compositions of the Oryza pangenome. The histogram displays the distribution of gene families in 23 subgenomes. The pie chart shows the proportion of gene families labeled by a component of pangenome, yellow: core gene families, blue: dispensable gene family, and green: private gene family. b Gene number of core gene families, dispensable gene families, and private gene families in each (sub)genome. c The upper diagram of the orthologous groups among the A, B, C, D, E, F, and G type group genomes. d The expression gene ratio in core, dispensable, and private genomes. e, f Gene expression and length in core, dispensable, and private genome. g Ka/Ks in core and dispensable genes. h The ratio of genes with LTR-RTs insertion in core, dispensable, and private genes. In ch, P-values were calculated using two-tailed Student’s t-tests. The middle bars show the median, and the bottle and top of each box indicate the 1/4 and 3/4 percentiles, respectively. the whiskers extend to 1.5 times the interquartile range. Sample sizes (n) represent the samples used for (d, h): n = 23, n = gene set numbers in (e, f, and g). i GO enrichment of the core genome. Significance was tested by a two-tailed Fisher’s exact method. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Evolution of LTR Retrotransposons in genomes and centromeres.
a Genome size of the Oryza genomes, including the A, B, C, D, E, F, and G genome types. The data was shown as the mean values ± SDs, n = sub/genome numbers. b LTR-RT subfamily expansion and contraction during the rice evolution, and the heatmap of TE density in wild rice species. Circle size indicates the length of the TE sequence. Color represents the LTR sub-family, light pink: Angela, orange: CRM, light blue: Ogre, light green: Retand, green: SIRE, light orange: Tekay. The TE density from low to high corresponds to the color from blue to red. c The size of LTR sub-families and the percentage of genes overlapped with LTR-RTs subfamilies in each rice species. d The correlation between genome size and LTR subfamily sequence length. Two-sides Pearson’s correlations test with P-value = 0 < 0.001 was performed. e The cluster tree was generated based on the multiple sequence-structure alignment of centromeric repeat unit sequences. f Landscape of LTR density, repeat k-mer ratio, and gene number density in chromosome 1, 100 kb, and 10 kb were used as a window for the picture above and below. g The gene number in centromere region and the number of gene overlapped with LTR region account for the total gene number in A, B, C, D, E, and F genome. h Synteny map of centromere region of chromosome 1 between O. rhizamatis and O. sativa ssp. japonica. The red line indicates the collinearity of genes at both ends of the centromere. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. The landscape of structural variation among wild and cultivated rice.
a Number and sequence length of different types of structural variation with each (sub)genome compared with the NIP reference genome. b The distribution of SV length in wild and cultivated rice. red color indicates wild species and blue color shows cultivated species. c The distribution of private genes of each rice diploid genome compared to NIP. From the outer-most track to the inner-most track: (a) the private gene position of R498 from A genome type rice. b O. punctata from B genome type. c O. minuta from C genome type. d O. alta from D genome. e O. Australians from E genome. f O. brachyantha from FF genome. (g) O. meyeriana from G genome type. h the gene density of NIP across the genome. i the PAV density, the sliding windows of (h and i)windows was 300 kb. d The distribution of PAV of each rice diploid genome. Diploid genomes from outermost circle to innermost circle: (a) R498(O. sativa ssp. indica); (b) CG14(O. glaberrima); (c) O. rufipogon; (d) O. glumaepatula; (e) O. malampuzhaensis | Bt; (f) O. punctata | Ct; (g) O. latifolia; (h) O. australiensis; (i) O. brachyantha; (j) O. meyeriana. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Alleles and their regulatory sequence variations in wild and cultivated rice accessions.
a Genome-wide allelic variation and nearby 10 kb upstream/downstream sequences variation number in the wild and cultivated rice group along with equal amounts of rice diploid genomes. bd Comparison of total SV (b), average collinear gene, gene haplotype, and CDS haplotype (c), orthologue protein cluster (d) between wild and cultivar rice group (x-axis). The y-axes represent the number of variants per gene, number of average collinear genes, gene haplotype, CDS haplotype, and number of protein clusters. The box plots show the medians (centerlines), interquartile ranges (boxes), and 1.5 times the interquartile ranges (whiskers), n = 20 for both cultivar rice and wild rice groups. Statistical significance (P-value) was determined using a two-sided Wilcoxson rank-sum test. e The unbalanced orthologue gene haplotype in wild and cultivated rice. f Alignment of an example Rc protein exhibited significantly more variation in wild than that in cultivated rice. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Characteristic of gene CNVs associated with rice important agronomic traits and R gene in Oryza genus.
a Feature of the CNV gene. Circle size indicates the gene copy number potentially results from a tandem duplicated mechanism. Color from gray to yellow shows the gene expression level from low to high. be Comparison of total R genes (b), singleton NLRs (c), paired NLRs (d), and clustered NLRs (e) number between wild and cultivar rice. The box plots show the medians (centerlines), interquartile ranges (boxes), and 1.5 times the interquartile ranges (whiskers). The sample size n = 20 for wild rice group, and n = 3 for cultivar rice group. P-values indicate the variation between wild and cultivar rice groups (Wilcoxon rank sum test). The red color represents wild rice species, the blue indicates cultivated rice. f The NLR gene number in each (sub)genome. The different colors in NLR copy numbers represent the subfamily of NLRs. The various color backgrounds indicate (sub) genomes from different genome types. g Venn diagram of orthologues NLRs gene in wild and cultivated rice. h The percentage of core and dispensable non-redundant NLRs in the wild and cultivated rice species. i Expression of core and dispensable non-redundant NLRs in wild and cultivated rice. pink shows the gene expression level is less than 0.1, red indicates the gene expression is larger than 0.1 and less than 1, and blue means the gene expression is larger than 1. j A representative region on chromosome 11 has less R gene in wild rice, but more R gene in cultivated rice. The same color means synteny NLR gene. Source data are provided as a Source Data file.

References

    1. Wing, R. A., Purugganan, M. D. & Zhang, Q. The rice genome revolution: from an ancient grain to Green Super Rice. Nat. Rev. Genet.19, 505–517 (2018). - PubMed
    1. Walkowiak, S. et al. Multiple wheat genomes reveal global variation in modern breeding. Nature588, 277–283 (2020). - PMC - PubMed
    1. Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature557, 43–49 (2018). - PMC - PubMed
    1. Khan, A. W. et al. Super-pangenome by integrating the wild side of a species for accelerated crop improvement. Trends Plant Sci.25, 148–158 (2020). - PMC - PubMed
    1. Stein, J. C. et al. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat. Genet50, 285–296 (2018). - PubMed

Publication types

LinkOut - more resources