Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr;18(2):173-185.
doi: 10.1016/j.gpb.2020.03.002. Epub 2020 Jun 30.

SR4R: An Integrative SNP Resource for Genomic Breeding and Population Research in Rice

Affiliations

SR4R: An Integrative SNP Resource for Genomic Breeding and Population Research in Rice

Jun Yan et al. Genomics Proteomics Bioinformatics. 2020 Apr.

Abstract

The information commons for rice (IC4R) database is a collection of 18 million single nucleotide polymorphisms (SNPs) identified by resequencing of 5152 rice accessions. Although IC4R offers ultra-high density rice variation map, these raw SNPs are not readily usable for the public. To satisfy different research utilizations of SNPs for population genetics, evolutionary analysis, association studies, and genomic breeding in rice, raw genotypic data of these 18 million SNPs were processed by unified bioinformatics pipelines. The outcomes were used to develop a daughter database of IC4R - SnpReady for Rice (SR4R). SR4R presents four reference SNP panels, including 2,097,405 hapmapSNPs after data filtration and genotype imputation, 156,502 tagSNPs selected from linkage disequilibrium-based redundancy removal, 1180 fixedSNPs selected from genes exhibiting selective sweep signatures, and 38 barcodeSNPs selected from DNA fingerprinting simulation. SR4R thus offers a highly efficient rice variation map that combines reduced SNP redundancy with extensive data describing the genetic diversity of rice populations. In addition, SR4R provides rice researchers with a web interface that enables them to browse all four SNP panels, use online toolkits, as well as retrieve the original data and scripts for a variety of population genetics analyses on local computers. SR4R is freely available to academic users at http://sr4r.ic4r.org/.

Keywords: Database; HapMap; Panel; Rice; SNP.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An overview of the four SNP panels of the SR4R database The flow chart describes procedures on how the four SNP panels were generated.
Figure 2
Figure 2
Basic statistics of the rice hapmapSNPs after four steps of genotype processing Genotype data were processed in four steps. A series of statistical analyses were performed at each step to exhibit the characteristics of the SNPs. A. Sample missing rate. B. Sample heterozygosity rate. C. Minor allele frequency. D. Genotype missing rate. E. Genotype heterozygosity rate. F. Distribution of the hapmapSNPs in different genomic regions. The hapmapSNPs were annotated using ARNOVAR analysis. SNP, single nucleotide polymorphism.
Figure 3
Figure 3
Population structure analysis of the 2556 rice accessions using tagSNPs To test whether the 156,502 tagSNPs can generate the population structures consistent with previous reports, a series of population structure analyses were performed to generate the phylogenetic tree (A), the PCA map (B), the admixture structure of 2556 rice accessions (C), and the phylogenetic tree of the six subgroups of Ind rice (D). Ind, indica rice; Aro, aromatic rice; TrJ, tropical japonica rice; TeJ, temperate japonica rice; Oru, O. rufipogon.
Figure 4
Figure 4
Genetic diversity analysis of rice accessions using tagSNPs The 156,502 tagSNPs were subjected to a series of population genetic analyses to show the effectiveness of tagSNPs including statistics of homozygous SNPs (A), statistics of sample heterozygosity (B), pairwise IBS values distribution (C), statistics of ROH regions (D), LD decay analysis, in the five major rice subpopulations (E), and genetic diversity (θπ) and population differentiation (Fst) between cultivated and wild subpopulations (F). Fst values above the lines between each cultivated rice and wild rice are presented in different colors; θπ values of the cultivated rice subpopulations are put in black above or below the respective ovals. G. Population differentiation (Fst) between each pair of cultivated rice subpopulations. LD, linkage disequilibrium; ROH, runs of homozygosity; IBS, identity by state.
Figure 5
Figure 5
GS-based phenotype prediction using tagSNPs Nine phenotype traits were predicted based on rrBLUP models to evaluate the effectiveness of tagSNPs. Five sets of SNPs were compared. Set-1: original 29,434 SNPs on the 44 K chip; Set-2: 1090 SNPs overlapping between the 156,502 tagSNPs and the original 29,434 SNPs; Set-3: 1090 SNPs randomly selected from the original 29,434 SNPs; Set-4: 1090 SNPs evenly distributed in the genome (one SNP per 350 kb) selected from the original 29,434 SNPs; Set-5: 1090 SNPs localized within a randomly selected genomic region from the original 29,434 SNPs. GS, genomic selection.
Figure 6
Figure 6
Screening and validation of fixedSNPs A. Distribution of θπ ratios (wild vs. cultivar) and corresponding Fst values, which are calculated in 100-kb windows. Data points located to the right of the vertical dashed line and to the top of the horizontal dashed line are potential strong selective sweep signals (red points, corresponding to the 5% right tails of the empirical θπ ratio and Fst values distribution, respectively). B. Distribution of Tajima’s D values for the potential selective sweep signals and whole genomes. C. Common and specific selective signals among cultivar subgroups (Number of genes and GSEA terms are shown out of and in the brackets, respectively). D. Phylogenetic tree of 2556 rice cultivars in the fixedSNP data set. E. Phylogenetic tree of 880 rice cultivars in the Affymetrix 700 K chip data set. F. Phylogenetic tree of 351 rice cultivars in the Illumina 44 K chip data set.
Figure 7
Figure 7
Representative functional modules in SR4R database A. Genes exhibiting significant selection signatures in the corresponding subpopulations are listed in the “Selected Genes” module in the browser. B. Allele frequencies in different subpopulations of the first hapmapSNP (SNPID: OSA01S00001362, associated gene: Os01g0100100, position: chr01-1362, allele: Alt-A, Ref-G). C. One example of the script and pipeline for population diversity analysis. D. The online analysis module of subpopulation classification using machine learning algorithms. E. The online analysis module of rice variety identification using the 38 barcodeSNPs.

References

    1. Li Z., Fu B.Y., Gao Y.M., Wang W.S., Xu J.L., Zhang F. The 3,000 rice genomes project. GigaScience. 2014;3:7. - PMC - PubMed
    1. Zhang Z., Hu S., He H., Zhang H., Chen F., Zhao W. Information commons for rice (IC4R) Nucleic Acids Res. 2016;44:D1172–D1180. - PMC - PubMed
    1. Frazer K.A., Ballinger D.G., Cox D.R., Hinds D.A., Stuve L.L., Gibbs R.A. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
    1. Flint-Garcia S.A., Thornsberry J.M., Buckler E.S. Structure of linkage disequilibrium in plants. Annu Rev Plant Biol. 2003;54:357–374. - PubMed
    1. Nielsen R. Molecular signatures of natural selection. Annu Rev Genet. 2005;39:197–218. - PubMed

Publication types