. 2023 Jan;21(1):202-218.

doi: 10.1111/pbi.13938. Epub 2022 Oct 21.

Time-ordering japonica/geng genomes analysis indicates the importance of large structural variants in rice breeding

Yu Wang^#^{1

2

3}, Fengcheng Li^#¹, Fan Zhang^#⁴, Lian Wu^#¹, Na Xu¹, Qi Sun¹, Hao Chen¹, Zhiwen Yu¹, Jiahao Lu¹, Kai Jiang¹, Xiaoche Wang¹, Siyu Wen^{2

3}, Yao Zhou^{2

3}, Hui Zhao^{2

3}, Qian Jiang^{2

3}, Jiahong Wang⁵, Ruizong Jia^{2

3}, Jian Sun¹, Liang Tang¹, Hai Xu¹, Wei Hu^{2

3}, Zhengjin Xu¹, Wenfu Chen¹, Anping Guo^{2

3}, Quan Xu¹

Affiliations

¹ Rice Research Institute of Shenyang Agricultural University, Shenyang, China.
² Sanya Research Institute of Chinese Academy of Tropical Agricultural Sciences, Sanya, China.
³ Hainan Key Laboratory for Biosafety Monitoring and Molecular Breeding in Off-Season Reproduction Regions, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Haikou, China.
⁴ Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
⁵ Biomarker Technologies Corporation, Beijing, China.

^# Contributed equally.

PMID: 36196761
PMCID: PMC9829401
DOI: 10.1111/pbi.13938

Time-ordering japonica/geng genomes analysis indicates the importance of large structural variants in rice breeding

Yu Wang et al. Plant Biotechnol J. 2023 Jan.

. 2023 Jan;21(1):202-218.

doi: 10.1111/pbi.13938. Epub 2022 Oct 21.

Authors

Affiliations

¹ Rice Research Institute of Shenyang Agricultural University, Shenyang, China.
² Sanya Research Institute of Chinese Academy of Tropical Agricultural Sciences, Sanya, China.
³ Hainan Key Laboratory for Biosafety Monitoring and Molecular Breeding in Off-Season Reproduction Regions, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Haikou, China.
⁴ Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
⁵ Biomarker Technologies Corporation, Beijing, China.

^# Contributed equally.

PMID: 36196761
PMCID: PMC9829401
DOI: 10.1111/pbi.13938

Abstract

Temperate japonica/geng (GJ) rice yield has significantly improved due to intensive breeding efforts, dramatically enhancing global food security. However, little is known about the underlying genomic structural variations (SVs) responsible for this improvement. We compared 58 long-read assemblies comprising cultivated and wild rice species in the present study, revealing 156 319 SVs. The phylogenomic analysis based on the SV dataset detected the putatively selected region of GJ sub-populations. A significant portion of the detected SVs overlapped with genic regions were found to influence the expression of involved genes inside GJ assemblies. Integrating the SVs and causal genetic variants underlying agronomic traits into the analysis enables the precise identification of breeding signatures resulting from complex breeding histories aimed at stress tolerance, yield potential and quality improvement. Further, the results demonstrated genomic and genetic evidence that the SV in the promoter of LTG1 is accounting for chilling sensitivity, and the increased copy numbers of GNP1 were associated with positive effects on grain number. In summary, the current study provides genomic resources for retracing the properties of SVs-shaped agronomic traits during previous breeding procedures, which will assist future genetic, genomic and breeding research on rice.

Keywords: Oryza sativa; japonica/geng; breeding process; de novo assembly; gene editing; structural variations.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Agronomic phenotypes and pedigree relationship of the genome for 12 GJ varieties. (a) The pedigree relationship among the 12 temperate GJ assemblies. The colours orange and blue stand for Chinese GJ and Japanese GJ respectively. (b) The plant and panicle architecture of 12 temperate GJ varieties assembled in this study.

**Figure 2**
Population structure of 58 long‐read assemblies. (a) The average SVs density of 58 assemblies. (b) Phylogenetic tree of 58 accessions including 12 assemblies in this study and 46 existing assemblies. The assemblies with red colour represent the 12 assemblies in the current study. Scale bar = 0.1. (c) Principal component analysis (PCA) plot for 58 *de novo* assemblies. (d) STRUCTURE analysis of 58 accessions with different numbers of clusters K = 4–6. The assemblies with red colour represent the 12 assemblies in the current study.

**Figure 3**
Structural variant (SV) characterization of GJ rice genomes. (a) The number of SVs in each assembly includes five types of SV. DEL, deletion; INS, insertion; INV, inversion; TRA, translocation. Other types included NOTAL (not aligned region), TDM (tandem repeat) and HDR (highly diverged regions). (b) The percentage of the detected SVs overlapped with different genomic regions in the 17 *Geng* assemblies. The mean percentage values of elements (2 kb upstream, coding region, intron, transposable elements and intergenic regions) are 21.4%, 3.5%, 16.6%, 48.4% and 10.1% respectively. (c) The SV distribution among 17 assemblies relative to Nip. A circular show of the detected SVs among the 17 GJ genomes with a sliding window size of 500 kb. (d) The landscape of some large‐size SVs among 17 GJ varieties and Nip. Red arrows direct the locus of SVs with insertions and deletions. Dark‐coloured bands display examples of large structural variations in inversion and translocation. (e) Presence of SVs among different breeding stages. SVs only existed in one group and were defined as specific SVs; the SVs not only existed in the first batch (shown in pink) but are also inherited into at least one group that was defined as common SVs. (f) Summary of transmitted SVs during past breeding. The values in differently coloured circles represent the number of SVs for corresponding SVs‐deriving sets including black (group‐specific SVs), purple (before 1980 inherited by following released groups), brown (1980–1990 inherited by following released groups), green (1990–2000 group inherited by following released groups) and blue (after 2000 inherited by internal varieties). (g) The pie plot shows the proportion changes in SV types between specific SVs and common SVs. The insertion (INS) and inversion (INV) rates are increased in common SVs. (h) Profiles of SV locus among different genetic elements. The proportion of SVs in the intergenic region is increased in common SVs.

**Figure 4**
SVs impact gene expression profiles. (a) The proportion of SV genes and non‐SV genes were associated significantly (P < 0.01) with altered expression of related genes. The differences in per cent values between SV genes and non‐SV genes were assessed using Student's t‐tests for five continuous expression ranges respectively. *Indicated a significance level at P < 0.01. (b) The SVs upstream of *LTG1* cause expression variants among GJ varieties. The blue line indicates the 288 bp insertion in the promoter of *LTG1*. (c) The temperature sensitivity (the difference in days to heading between plants under high and low temperatures) of non‐SV‐*LTG1* and SV‐*LTG1* varieties. Data are mean ± SEM (n = 7 for non‐SV‐*LTG1*, and n = 6 for SV‐*LTG1*), and *indicates significance at the P < 0.05 level. (d) Diagram and sequence of *LTG1* CRISPR knockout lines (*ltg1‐cr*1 and *ltg1‐cr2*). The red line indicates the position of the sgRNA target site. (e) The temperature sensitivity of WT and CRISPR knockout lines (*ltg1‐cr*1 and *ltg1‐cr2*). Data are mean ± SEM (n = 10), and different letters indicate significant differences (P < 0.05, one‐way ANOVA, Tukey's HSD test).

**Figure 5**
Characteristics of gene CNVs related to important agronomic traits. (a) The functional genes with CNV mutations. Circle size represents the number of gene copies potentially generated by a tandem duplicated mechanism. Colours from light to dark imply the global expression level [log2 (FPKM)] of genes ranging from low to high. (b) Local syntenic relation of *GNP1* implying breeding selection of different CNVs among 18 GJ varieties. The blue rectangle represents the forward strand gene in the chromosome, and the green rectangle means the reverse strand gene. Orange‐linked bands highlight homologue gene pairs having different copy numbers in this region. (c) Local syntenic relation of *Pigm* implying breeding selection of different copy number variation among 18 GJ varieties. The colours are the same as (b). The red dashed rectangle represents the *Pigm* cluster (R1–*R13*) in Nip. Dark grey bands linked homologue R genes among assemblies. The orange band tracks the evolutionary pattern of R2 (*LOC_Os06g17900*) along with released GJ varieties.

**Figure 6**
Gene copy number variants (CNVs) are associated with variations in production. (a) Schematic illustrating a single copy of *GNP1* in Nip and ZH11 and three copies of *GNP1* in Toyo. (b) DNA qPCR validation of the three *GNP1* copies. *Indicates significance at the P < 0.05 level. (c) The expression of *GNP1* in Toyo with three copies is significantly higher than in Nip with a single copy of *GNP1*. *Indicates significance at the P < 0.05 level. (d) The expression level of *GNP1* in Zh11 (CK) and two independent over‐expression transgenic lines. *Indicates significance at the P < 0.05 level. (e) The Zh11 (CK) plant architecture and two independent over‐expression transgenic lines. Bar = 20 cm. (f) The Zh11 (CK) plant height and two independent over‐expression transgenic lines. Data are mean ± SEM (n = 10), and *indicates significance at the P < 0.05 level. (g) Zh11 (CK) panicle size and two independent over‐expression transgenic lines. Bar = 1 cm. (h) The grains are derived from one Zh11 (CK) panicle and two independent over‐expression transgenic lines. Bar = 1 cm. (i) The grain number per panicle of Zh11 (CK) and two independent over‐expression transgenic lines. Data are mean ± SEM (n = 10), and *indicates significance at the P < 0.05 level. (j) The SVs around *GNP1* in the 18 assemblies. (k) The distribution of multiple copies of *GNP1* among 74 GJ varieties. (l) The grain number per panicle of varieties harbouring multiple *GNP1* copies and varieties harbouring a single copy of *GNP1*. *Indicates significance at the P < 0.05 level.

**Figure 7**
The selection of the SVs in GJ. (a) A total of 24 878 SVs were overlapped with select sweeps, which involved 4089 genes. (b) Selection sweeps uncovered by joint cross‐population composite likelihood ratio (XP‐CLR) and diversity reduction index (DRI) approaches for the GJ population. Genes or QTLs related to yield, grain quality, hybrid sterility and biotic and abiotic stresses in the selection sweeps are indicated (Table S9).

**Figure 8**
The introgression of the SVs and inferior allele editing breeding. (a) Venn diagrams showing the number of the traced SVs from wild, cA, cB, *japonica*/*geng* (GJ) and *indica*/*xian* (XI). (b) A heat map showing the introgression of SVs around *Chalk5*. (c) The enlarged image of XP‐CLR score around *Chalk5* and *GS5*. (d) Linkage disequilibrium plot for SVs. (e) Diagram and sequence of *Chalk5* CRISPR knockout lines (*chalk5‐cr*1 and *chalk5‐cr2*). The red line indicates the position of the sgRNA target site. (f) The plant architecture of WT and *chalk5‐cr*1 and *chalk5‐cr2*. Bar = 10 cm. (g) The grain shape of WT and *chalk5‐cr*1 and *chalk5‐cr2*. Bar = 1 cm. (h) The panicle of WT and *chalk5‐cr*1 and *chalk5‐cr2*. Bar = 1 cm. (i) The chalkiness trait of WT and *chalk5‐cr*1 and *chalk5‐cr2*. Bar = 1 cm.

See this image and copyright information in PMC

References

1. Alexander, D.H. , Novembre, J. and Lange, K. (2009) Fast model‐based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664. - PMC - PubMed
1. Altschul, S.F. , Gish, W. , Miller, W. , Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410. - PubMed
1. Barrett, J.C. , Fry, B. , Maller, J. and Daly, M.J. (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21, 263–265. - PubMed
1. Birney, E. , Clamp, M. and Durbin, R. (2004) GeneWise and genomewise. Genome Res. 14, 988–995. - PMC - PubMed
1. Blanco, E. , Parra, G. and Guigó, R. (2007) Using geneid to identify genes. Curr. Protoc. Bioinform. 18, 4–3. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Time-ordering japonica/geng genomes analysis indicates the importance of large structural variants in rice breeding

Affiliations

Time-ordering japonica/geng genomes analysis indicates the importance of large structural variants in rice breeding

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources