Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 31;16(1):26.
doi: 10.1186/s13059-015-0582-8.

A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome

Affiliations

A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome

Jarrod A Chapman et al. Genome Biol. .

Abstract

Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

PubMed Disclaimer

Figures

Figure 1
Figure 1
51-mer depth distribution for homozygous parental lines. (A) 51-mer frequency distribution for W7984 (red), compared with Opata (black). W7984 was sequenced more deeply to enable de novo WGS assembly. Uptick at low depth (below 51-mer frequency of approximately 5) corresponds to sequencing error. Peak frequency (approximately 18 for W7984, approximately 11 for Opata) represents the typical number of 51-mers covering nucleotides in the non-repetitive regions of the genome. (B) Cumulative frequency distribution for W7984 and Opata as a function of estimated genomic copy count (51-mer frequency divided by peak 51-mer frequency from panel (A)). Note logarithmic scale on the horizontal axis. The two curves lie on top of each other, as expected for two accessions from the same species. Approximately 45% of the hexaploid wheat genome is found in regions that are single copy as measured by 51-mers (estimated genomic copy count ≤2), and the remainder is typically at high 51-mer copy number (approximately 40% of the genome is found in 10 or more copies). The distribution rises smoothly through estimated genome copy counts of two and three, indicating the three subgenomes of hexaploid wheat are largely differentiated at the scale of a 51-mer.
Figure 2
Figure 2
Cumulative distributions of assembled sequence as a function of scaffold and contig length. The total amount of assembled sequence in scaffolds or contigs longer than a minimum length is shown. As the available paired-end insert size is increased, the W7984 WGS assembly becomes progressively longer, with the inclusion of short-inserts (<500 bp) only (red); the addition of medium-inserts (700 bp to 1 kbp; dark blue); and finally the inclusion of approximately 4 kbp insert mate pairs (green). For comparison, the International Wheat Genome Sequencing Consortium chromosome-sorted assembly of ‘Chinese Spring’ (CSS) is also shown (black dashed line). Cumulative contig distributions for W7984 (light blue) and CSS (gray dashed line) are also depicted. As predicted by assembly theory, these quantities are exponentially distributed with decay lengths proportional to the N50 length scale of the assembly. This demonstrates that the excess length of the CSS assembly is restricted to an abundance of very short sequences (less than 1 kbp in length) that are outside of the body of the main exponential decay curves.
Figure 3
Figure 3
Distribution of percent identities of alignments of ‘Chinese Spring’ full-length cDNAs versus genome assemblies. (A) Frequency distribution of best percent identity of flcDNA alignments to IWGSC ‘Chinese Spring’ (blue bars) and W7984 WGS (red bars) assemblies. Results for both assemblies are superimposed; red and blue overlap is shown as purple. Included are all alignments longer than 50% of query flcDNA length. Note that while most ‘Chinese Spring’ cDNAs align at >99.75% identity to the IWGSC ‘Chinese Spring’ genome assembly, there is a long tail of lower identity best matches that could arise from errors in the genome assembly or in the flcDNA sequences. Matches to the W7984 assembly show most matches >99.50%, as expected given the intra-specific polymorphism between ‘Chinese Spring’ and W7984, but also show the long tail of lower identity. For W7984, these may arise from the absence in the genotype of the locus corresponding to the ‘Chinese Spring’ cDNA. (B) Frequency distribution of percent identity of flcDNA alignments longer than 50% of query flcDNA length, showing only those cDNAs with five or fewer such alignments. The secondary peak centered at approximately 97 to 97.5% corresponds to homeologous matches. As expected given the polymorphism between the two hexaploid wheat lines, the ‘Chinese Spring’ cDNAs align at slightly higher identity to their own genotype than to W7984.
Figure 4
Figure 4
Validation of the POPSEQ genetic map. (A) POPSEQ positions [24] of barley high-confidence genes [45] were compared with the genetic positions of their putative orthologs in our wheat POPSEQ map. Assignment of orthologous groups agreed in 87% of the cases. Genetic positions within the orthologous group showed high collinearity (Spearman’s ρ = 0.936). Known translocation events relative to barley involving wheat chromosomes 4A, 5A and 7B [46] could be traced with high precision. (B) Collinearity with a previous genetic map of the Synthetic × Opata population constructed through genotyping by sequencing [28]. A total of 11,000 out of 20,000 genotyping-by-sequencing tags carrying SNPs could be uniquely mapped to our assembly. Chromosome assignments agreed for 99.5% of the genotyping-by-sequencing tags aligned to anchored sequence scaffolds. Genetic positions within linkage groups were highly correlated (Spearman’s ρ = 0.995). (C) Chromosome shotgun contigs were anchored to the same genetic framework as the meraculous scaffolds of W7984. Genetic positions of contigs and scaffolds matched by sequence alignment differed by less than 5 cM in 99.1% of the cases. Chromosomes are separated by blue lines, subgenomes by red lines.
Figure 5
Figure 5
Nucleotide diversity in the wheat genome. (A) The average number of SNPs per kilobase between the three wheat types Chinese Spring (C), Opata (O) and W7984 (W) is shown across all three subgenomes (ABD) or in the individual subgenomes (A, B and D). The numbers on the outside of the triangles gives the diversity across all sequences in the respective subgenomes, those on the inside give the diversity in coding sequences only. (B) Diversity between homeologous genes. Full-length cDNAs [35] were aligned to our assembly of W7984 and assigned to one of the subgenomes using the genetic anchoring of the assembly. This plot shows the distribution of nucleotide identity between cDNAs assigned to the A, B and D subgenomes and their best BLAST hit in the other two subgenomes (that is, to their putative homeologous loci).

References

    1. Weber JL, Myers EW. Human whole-genome shotgun sequencing. Genome Res. 1997;7:401–9. - PubMed
    1. Green P. Against a whole-genome shotgun. Genome Res. 1997;7:410–7. - PubMed
    1. Smith JJ, Putta S, Zhu W, Pao GM, Verma IM, Hunter T, et al. Genic regions of a large salamander genome contain long introns and novel genes. BMC Genomics. 2009;10:19. doi: 10.1186/1471-2164-10-19. - DOI - PMC - PubMed
    1. Brenchley R, Spannagl M, Pfeifer M, Barker GL, D’Amore R, Allen AM, et al. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature. 2012;491:705–10. doi: 10.1038/nature11650. - DOI - PMC - PubMed
    1. International Wheat Genome Sequencing Consortium A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science. 2014;345:1251788. doi: 10.1126/science.1251788. - DOI - PubMed

Publication types