Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 24;15(1):7291.
doi: 10.1038/s41467-024-51854-0.

The pan-tandem repeat map highlights multiallelic variants underlying gene expression and agronomic traits in rice

Affiliations

The pan-tandem repeat map highlights multiallelic variants underlying gene expression and agronomic traits in rice

Huiying He et al. Nat Commun. .

Abstract

Tandem repeats (TRs) are genomic regions that tandemly change in repeat number, which are often multiallelic. Their characteristics and contributions to gene expression and quantitative traits in rice are largely unknown. Here, we survey rice TR variations based on 231 genome assemblies and the rice pan-genome graph. We identify 227,391 multiallelic TR loci, including 54,416 TR variations that are absent from the Nipponbare reference genome. Only 1/3 TR variations show strong linkage with nearby bi-allelic variants (SNPs, Indels and PAVs). Using 193 panicle and 202 leaf transcriptomic data, we reveal 485 and 511 TRs act as QTLs independently of other bi-allelic variations to nearby gene expression, respectively. Using plant height and grain width as examples, we identify and validate TRs contributions to rice agronomic trait variations. These findings would enhance our understanding of the functions of multiallelic variants and facilitate rice molecular breeding.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Construction and validation of the pan-tandem repeat loci dataset.
a Schematic of the pan-TR polymorphism dataset. In a previous study, we assembled the genomes of 230 rice accessions with broad genetic diversity (including 202 O.sativa accessions and 28 O.rufipogon accessions) to construct a pan-genome graph. In the present study, we conducted de novo whole-genome tandem repeat annotation for each accession and the Nipponbare genome. After integrating the TR annotations into the pan-genome graph to get TR variation loci, we obtained the pan-TR polymorphism dataset, which included TR loci absent from the reference genome. Known TR variations that are causal for rice phenotypes were validated in the pan-TR dataset. Alleles for TRs around OsSPL13 (b, c) and COLD11 (d, e) and their distribution among rice subpopulations.
Fig. 2
Fig. 2. Characteristic patterns in the pan-tandem repeat (TR) dataset.
a Distribution of each TR type. The inner pie chart indicates the ratio of short TRs (STRs) (red) and variable number TRs (VNTRs) (blue) in the pan-TR dataset. The outer pie chart indicates the ratio of TRs that were present in the Nipponbare reference genome (dark green) and TRs absent from the reference genome (light green). b Statistics summarizing TR copy number differences between the major allele and the reference allele. Red and blue dots indicate STRs and VNTRs, respectively. c Distribution of the repeat motif length at each TR locus. d Distribution of allele numbers at each TR locus. e Distribution of the frequency of the major alleles at each TR locus. The dashed line indicates a major allele frequency of 0.5. f The distribution of genetic variants’ distance to the nearest transcription start site (TSS). Each color indicates a genomic variant. The overlap between genetic variants indicates similar distribution between variants. g Distribution of genomic variations along each chromosome. p-values indicate differences in distribution between TRs and other bi-allelic variants (Wilcoxon rank sum test). h Distribution of linkage disequilibrium (LD) values between TRs and bi-allelic variants within 100 kb. LD was calculated as the absolute value of a pairwise Pearson’s correlation test (|R|). For each TR, the maximum |R| value with adjacent variants on either side is recorded. The dashed line indicates |R|  =  0.30 and |R|   =  0.70. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Characterization of expression quantitative trait loci (eQTLs) for panicle and leaf tissues.
a Statistical comparison of genes for which expression levels were significantly associated with genetic variants (eGenes) in the panicle and young leaf tissues. b Manhattan plot of a panicle-specific eGene DHT1(LOC_Os04g54440). DHT1 was expressed in both leaf and panicle tissues. Genetic variants were significantly associated with its panicle expression variations, other than leaf expression variation. c Pearson correlation analysis between TR repeat numbers and normalized panicle expression of the gene DHT1(LOC_Os04g54440). The error bands indicate 95% confidence intervals. The p-values were calculated by two-sided t test. d, e Statistical comparison of eGenes identified using different genomic variants as markers in the panicle (d) and the young leaf tissues (e). f Example of eGenes identified only with TR markers in the panicle. g Pearson correlation analysis between the TR repeat number and LOC_Os01g02910 expression in the panicle. The error bands indicate 95% confidence intervals. The p-values were calculated by two-sided t test. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Contributions of eTRs to gene expression variation in young leaf tissue.
a Comparison of gene expression models with or without eTR variant. R2 indicated expression variations explained by the model. X-axis and Y-axis indicate R2 of models including only the eBi-allelic variant and both eBi-allelic variant and eTR, respectively. Red dots indicate genes for which models including eTRs were significantly better than models including only eBi-allelic variants (Benjamini–Hochberg test, q value < 0.05); blue dots indicate those without significantly differences. b Original (unconditioned) eTR effect sizes (β) compared to conditioned eTR β. Red points indicate eTRs with consistent effect directions between conditioned and unconditioned analysis; the rest points indicate those with discordant effect directions. Not significant β are represented as 0. c Manhattan plot for OsPRR1. Cx represents eTR repeat number. Pie chart shows eTR and eSNP distribution. d Pearson correlation analysis between eTR repeat number and OsPRR1 leaf expression. Red and green regression lines indicate analyses including all accessions and only accessions with the major SNP type, respectively. e Plant height among accessions with different eTR repeat numbers. f Pearson correlation analysis between the eTR repeat number and plant height. Red and green regression lines indicate analyses including all accessions and only accessions with the major SNP type, respectively. g Schematic diagram indicating that mutated site of osprr1. h Morphologies of the osprr1 mutant and the wild type (Xiushui134). i Plant height of the osprr1 mutants (n = 36) and the wild type (n = 38) plants after heading stage. j Schematic diagram of the recombinant vectors containing OsPRR1 promoter for firefly luciferase complementation assay. REN Renilla luciferase, LUC firefly luciferase, pOsPRR1C3(NH242) the vector containing the OsPRR1 promoter region with 3 TR copies, pOsPRR1C4(NH027) the vector containing 4 TR copies. k Relative LUC/REN activity in tobacco protoplast transformed with pOsPRR1C3 and pOsPRR1C4 vector (n = 3). In d and f, the error bands indicate 95% confidence intervals (two-tailed t test). In i and k, data presented as mean ± SD, p-values were generated using two-tailed t test. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Colocalization of TRs affecting both grain width and gene expression.
a Manhattan plot showing associations between grain width and TR variations, PAVs, SNPs and Indels. Dashed lines of each color indicate the genome-wide threshold value for the corresponding variant type. Around the 1 Mbp region of the leading peak, 122 genes were expressed in the panicle tissue (Median of FPKM > 0 and Maximum of FPKM ≥ 1). Expression levels of the five indicated genes were significantly associated with grain width. b Manhattan plot showing associations between TRs and expression levels of the five genes in the panicle associated with grain width. The dashed line indicates the genome-wide threshold value. Posterior probability of causality for grain width (c) and expression levels of the three genes with significant eTRs (d). “#” indicates the value of the posterior probability. e Posterior probability of a TR being the causal variant for both grain width and LOC_Os06g03850 expression. The dashed line across ce indicates the position of the same TR variant. f The candidate causal TR was in the promoter region of LOC_Os06g03850 and 7 alleles existed in the present dataset. g Associations between TR repeat numbers and grain width in the indica and japonica subpopulation of Oryza sativa. h Associations between TR repeat numbers and LOC_Os06g03850 expression in the panicle tissue among members of the indica and japonica subpopulation (green and red lines, respectively). In g and h, the error bands indicate 95% confidence intervals, p-values were calculated by two-sided t test. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Validation of TR effects to seed width.
a Schematic diagram indicating that the target and mutated site of TRGW6 by CRISPR/Cas9 technology. b The grain width phenotype of trgw6 and its wild type (ZH11). Bar = 10 mm. The average grain width of all seeds in a single plant represents the grain width value of the plant (n = 10). c The TR sequence edit diagram of TRGW6 by CRISPR–Cas12a promoter editing (CAPE) system in NH142 background. TR position was shown in (Fig. 5f). The background accession NH142 contains 9 TR copies in TRGW6 promoter region, and 1 copy was removed in the editing lines (NH142-1copy). d Expression level of TRGW6 in plants of NH142 and NH142-1copy. Rice ACTIN was used for the internal reference. e, f The grain width phenotype of NH142 and NH142-1copy. The average grain width of all seeds in a single plant represents the grain width value of the plant. Three plants were measured from each of the two materials (n = 3). g The TR sequence edit diagram of TRGW6 by CAPE in NH072 background. The background accession NH072 contains 9 TR copies in TRGW6 promoter region and 1 TR copy was added in the editing lines (NH072+1copy). h Expression level of TRGW6 in NH072 and NH072+1copy (n = 3). Rice ACTIN was used for the internal reference. i, j The grain width phenotype of NH072 (n = 5) and NH072+1copy (n = 3). The average grain width of all seeds in a single plant represents the grain width value of the plant. In b, d, e, h and i, data presented as mean ± SD. The p-values were calculated by two-sided t test. Source data are provided as a Source Data file.

References

    1. Albert, F. W. & Leonid, K. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet.16, 197–212 (2015). 10.1038/nrg3891 - DOI - PubMed
    1. Song, X. G. et al. IPA1 functions as a downstream transcription factor repressed by D53 in strigolactone signaling in rice. Cell Res.27, 1128–1141 (2017). 10.1038/cr.2017.102 - DOI - PMC - PubMed
    1. Zhang, L. et al. A natural tandem array alleviates epigenetic repression of IPA1 and leads to superior yielding rice. Nat. Commun.8, 14789 (2017). 10.1038/ncomms14789 - DOI - PMC - PubMed
    1. Zhou, J. P. et al. An efficient CRISPR-Cas12a promoter editing system for crop improvement. Nat. Plants9, 588–604 (2023). 10.1038/s41477-023-01384-2 - DOI - PubMed
    1. Xue, C. X. et al. Tuning plant phenotypes by precise, graded downregulation of gene expression. Nat. Biotechnol.41, 1758–1764 (2023). 10.1038/s41587-023-01707-w - DOI - PubMed

LinkOut - more resources