SR4R: An Integrative SNP Resource for Genomic Breeding and Population Research in Rice

Jun Yan¹, Dong Zou², Chen Li³, Zhang Zhang², Shuhui Song⁴, Xiangfeng Wang⁵

Affiliations

¹ Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China.
² China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100101, China.
³ Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China.
⁴ China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100101, China. Electronic address: songshh@big.ac.cn.
⁵ Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China. Electronic address: xwang@cau.edu.cn.

PMID: 32619768
PMCID: PMC7646087
DOI: 10.1016/j.gpb.2020.03.002

SR4R: An Integrative SNP Resource for Genomic Breeding and Population Research in Rice

Jun Yan et al. Genomics Proteomics Bioinformatics. 2020 Apr.

. 2020 Apr;18(2):173-185.

doi: 10.1016/j.gpb.2020.03.002. Epub 2020 Jun 30.

Authors

Jun Yan¹, Dong Zou², Chen Li³, Zhang Zhang², Shuhui Song⁴, Xiangfeng Wang⁵

Affiliations

¹ Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China.
² China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100101, China.
³ Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China.
⁴ China National Center for Bioinformation, Beijing 100101, China; National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100101, China. Electronic address: songshh@big.ac.cn.
⁵ Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China. Electronic address: xwang@cau.edu.cn.

PMID: 32619768
PMCID: PMC7646087
DOI: 10.1016/j.gpb.2020.03.002

Abstract

The information commons for rice (IC4R) database is a collection of 18 million single nucleotide polymorphisms (SNPs) identified by resequencing of 5152 rice accessions. Although IC4R offers ultra-high density rice variation map, these raw SNPs are not readily usable for the public. To satisfy different research utilizations of SNPs for population genetics, evolutionary analysis, association studies, and genomic breeding in rice, raw genotypic data of these 18 million SNPs were processed by unified bioinformatics pipelines. The outcomes were used to develop a daughter database of IC4R - SnpReady for Rice (SR4R). SR4R presents four reference SNP panels, including 2,097,405 hapmapSNPs after data filtration and genotype imputation, 156,502 tagSNPs selected from linkage disequilibrium-based redundancy removal, 1180 fixedSNPs selected from genes exhibiting selective sweep signatures, and 38 barcodeSNPs selected from DNA fingerprinting simulation. SR4R thus offers a highly efficient rice variation map that combines reduced SNP redundancy with extensive data describing the genetic diversity of rice populations. In addition, SR4R provides rice researchers with a web interface that enables them to browse all four SNP panels, use online toolkits, as well as retrieve the original data and scripts for a variety of population genetics analyses on local computers. SR4R is freely available to academic users at http://sr4r.ic4r.org/.

Keywords: Database; HapMap; Panel; Rice; SNP.

PubMed Disclaimer

Figures

**Figure 1**
**An overview of the four SNP panels of the SR4R database** The flow chart describes procedures on how the four SNP panels were generated.

**Figure 2**
**Basic statistics of the rice hapmapSNPs after four steps of genotype processing** Genotype data were processed in four steps. A series of statistical analyses were performed at each step to exhibit the characteristics of the SNPs. A. Sample missing rate. B. Sample heterozygosity rate. C. Minor allele frequency. D. Genotype missing rate. E. Genotype heterozygosity rate. F. Distribution of the hapmapSNPs in different genomic regions. The hapmapSNPs were annotated using ARNOVAR analysis. SNP, single nucleotide polymorphism.

**Figure 3**
**Population structure analysis of the 2556 rice accessions using tagSNPs** To test whether the 156,502 tagSNPs can generate the population structures consistent with previous reports, a series of population structure analyses were performed to generate the phylogenetic tree (A), the PCA map (B), the admixture structure of 2556 rice accessions (C), and the phylogenetic tree of the six subgroups of *Ind* rice (D). *Ind*, *indica* rice; Aro, *aromatic* rice; *TrJ*, tropical *japonica* rice; *TeJ*, temperate *japonica* rice; *Oru*, *O. rufipogon*.

**Figure 4**
**Genetic diversity analysis of rice accessions using tagSNPs** The 156,502 tagSNPs were subjected to a series of population genetic analyses to show the effectiveness of tagSNPs including statistics of homozygous SNPs (A), statistics of sample heterozygosity (B), pairwise IBS values distribution (C), statistics of ROH regions (D), LD decay analysis, in the five major rice subpopulations (E), and genetic diversity (θπ) and population differentiation (*Fst*) between cultivated and wild subpopulations (F). *Fst* values above the lines between each cultivated rice and wild rice are presented in different colors; θπ values of the cultivated rice subpopulations are put in black above or below the respective ovals. G. Population differentiation (*Fst*) between each pair of cultivated rice subpopulations. LD, linkage disequilibrium; ROH, runs of homozygosity; IBS, identity by state.

**Figure 5**
**GS-based phenotype prediction using tagSNPs** Nine phenotype traits were predicted based on rrBLUP models to evaluate the effectiveness of tagSNPs. Five sets of SNPs were compared. Set-1: original 29,434 SNPs on the 44 K chip; Set-2: 1090 SNPs overlapping between the 156,502 tagSNPs and the original 29,434 SNPs; Set-3: 1090 SNPs randomly selected from the original 29,434 SNPs; Set-4: 1090 SNPs evenly distributed in the genome (one SNP per 350 kb) selected from the original 29,434 SNPs; Set-5: 1090 SNPs localized within a randomly selected genomic region from the original 29,434 SNPs. GS, genomic selection.

**Figure 6**
**Screening and validation of fixedSNPs** A. Distribution of θπ ratios (wild vs. cultivar) and corresponding *Fst* values, which are calculated in 100-kb windows. Data points located to the right of the vertical dashed line and to the top of the horizontal dashed line are potential strong selective sweep signals (red points, corresponding to the 5% right tails of the empirical θπ ratio and *Fst* values distribution, respectively). B. Distribution of *Tajima’s D* values for the potential selective sweep signals and whole genomes. C. Common and specific selective signals among cultivar subgroups (Number of genes and GSEA terms are shown out of and in the brackets, respectively). D. Phylogenetic tree of 2556 rice cultivars in the fixedSNP data set. E. Phylogenetic tree of 880 rice cultivars in the Affymetrix 700 K chip data set. F. Phylogenetic tree of 351 rice cultivars in the Illumina 44 K chip data set.

**Figure 7**
**Representative functional modules in SR4R database** A. Genes exhibiting significant selection signatures in the corresponding subpopulations are listed in the “Selected Genes” module in the browser. B. Allele frequencies in different subpopulations of the first hapmapSNP (SNPID: OSA01S00001362, associated gene: Os01g0100100, position: chr01-1362, allele: Alt-A, Ref-G). C. One example of the script and pipeline for population diversity analysis. D. The online analysis module of subpopulation classification using machine learning algorithms. E. The online analysis module of rice variety identification using the 38 barcodeSNPs.

See this image and copyright information in PMC

References

1. Li Z., Fu B.Y., Gao Y.M., Wang W.S., Xu J.L., Zhang F. The 3,000 rice genomes project. GigaScience. 2014;3:7. - PMC - PubMed
1. Zhang Z., Hu S., He H., Zhang H., Chen F., Zhao W. Information commons for rice (IC4R) Nucleic Acids Res. 2016;44:D1172–D1180. - PMC - PubMed
1. Frazer K.A., Ballinger D.G., Cox D.R., Hinds D.A., Stuve L.L., Gibbs R.A. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
1. Flint-Garcia S.A., Thornsberry J.M., Buckler E.S. Structure of linkage disequilibrium in plants. Annu Rev Plant Biol. 2003;54:357–374. - PubMed
1. Nielsen R. Molecular signatures of natural selection. Annu Rev Genet. 2005;39:197–218. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SR4R: An Integrative SNP Resource for Genomic Breeding and Population Research in Rice

Affiliations

SR4R: An Integrative SNP Resource for Genomic Breeding and Population Research in Rice

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources