Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014;15(11):506.
doi: 10.1186/PREACCEPT-2784872521277375.

Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica

Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica

Michael C Schatz et al. Genome Biol. 2014.

Abstract

Background: The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. When the genomes of different strains of a given organism are compared, whole genome resequencing data are typically aligned to an established reference sequence. However, when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate.

Results: Here, we use rice as a model to demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared using whole genome alignment to provide an unbiased assessment. Using this approach, we are able to accurately assess the "pan-genome" of three divergent rice varieties and document several megabases of each genome absent in the other two.

Conclusions: Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard reference-mapping approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, including the S5 hybrid sterility locus, the Sub1 submergence tolerance locus, the LRK gene cluster associated with improved yield, and the Pup1 cluster associated with phosphorus deficiency, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Population structure in O. sativa . A principal component analysis (PCA) based on 40,000 SNPs shows the deep subpopulation structure of a rice diversity panel (400 O. sativa accessions). The top two principal components (PC1 and PC2) explain 44.1% of the genetic variation. Accessions are color-coded based on subpopulation: red, indica; dark blue, temperate japonica; light blue, tropical japonica; yellow, aus; purple, aromatic; black, admixed. Figure reproduced with permission from [7].
Figure 2
Figure 2
Venn diagrams of the shared sequence content between Nipponbare ( temperate japonica ), IR64 ( indica ) and DJ123 ( aus ). (A) overall sequence content. In each sector, the top number is the total number of base pairs, the middle number is the number of exonic bases, and the bottom is the gene count. If a gene is partially shared, it is assigned to the sector with the most exonic bases. (B) Genic content. In each sector, the top number is the median CDS length, the middle number is the average number of exons per gene, and the bottom is the percentage InterPro/homology.
Figure 3
Figure 3
PCR validation of genome-specific regions. Regions identified as unique to each genome assembly were amplified from genomic DNA of all three genomes and visualized on 1% agarose gels. (A) Nipponbare-specific sequences. (B) IR64-specific sequences. (C) DJ123-specific sequences.
Figure 4
Figure 4
K-mer coverage in the three assemblies across the Sub1A gene. In each panel, the k-mer coverage of the sequence reads of the three respective genomes are plotted according to the sequence of the Sub1A A-2 allele. Only IR64 has consistent coverage across the gene, while the other two genomes have sparse coverage of a few repetitive k-mer sequences. For clarity, the k-mer coverage range 1× to 50,000× (log scale) is displayed in all the plots.
Figure 5
Figure 5
K-mer Coverage across the Kasalath/ Pstol1 gene in the three genomes, with 30 kbp of upstream and downstream flanking sequence. The k-mer coverage is plotted with respect to the reference Kasalath sequence (AB458444.1). The position of the Pstol1 gene is indicated with green vertical bars. Also see Figure S9 in Additional file 1 for a detailed view of the Pstol1 coverage, and Figure S10 in Additional file 1 for a plot of the entire Kasalath sequence. Unresolved gaps in the reference sequence are indicated with black vertical bars. Only DJ123 has consistent coverage across this region, especially upstream of the gene, while the other two genomes show complete gaps in coverage.

References

    1. Garris AJ, Tai TH, Coburn J, Kresovich S, McCouch S. Genetic structure and diversity in Oryza sativa L. Genetics. 2005;169:1631–1638. doi: 10.1534/genetics.104.035642. - DOI - PMC - PubMed
    1. Huang X, Kurata N, Wei X, Wang ZX, Wang A, Zhao Q, Zhao Y, Liu K, Lu H, Li W, Guo Y, Lu Y, Zhou C, Fan D, Weng Q, Zhu C, Huang T, Zhang L, Wang Y, Feng L, Furuumi H, Kubo T, Miyabayashi T, Yuan X, Xu Q, Dong G, Zhan Q, Li C, Fujiyama A, Toyoda A, et al. A map of rice genome variation reveals the origin of cultivated rice. Nature. 2012;490:497–501. doi: 10.1038/nature11532. - DOI - PMC - PubMed
    1. Zhao KY, Wright M, Kimball J, Eizenga G, McClung A, Kovach M, Tyagi W, Ali ML, Tung CW, Reynolds A, Bustamante CD, McCouch SR. Genomic diversity and introgression in O. sativa reveal the impact of domestication and breeding on the rice genome. Plos One. 2010;5:e10780. doi: 10.1371/journal.pone.0010780. - DOI - PMC - PubMed
    1. Boyko AR, Quignon P, Li L, Schoenebeck JJ, Degenhardt JD, Lohmueller KE, Zhao K, Brisbin A, Parker HG, vonHoldt BM, Cargill M, Auton A, Reynolds A, Elkahloun AG, Castelhano M, Mosher DS, Sutter NB, Johnson GS, Novembre J, Hubisz MJ, Siepel A, Wayne RK, Bustamante CD, Ostrander EA. A simple genetic architecture underlies morphological variation in dogs. PLoS Biol. 2010;8:e1000451. doi: 10.1371/journal.pbio.1000451. - DOI - PMC - PubMed
    1. Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG. Measures of human population structure show heterogeneity among genomic regions. Genome Res. 2005;15:1468–1476. doi: 10.1101/gr.4398405. - DOI - PMC - PubMed

Publication types