Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;216(2):599-608.
doi: 10.1534/genetics.120.303501. Epub 2020 Aug 12.

Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies

Affiliations

Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies

Michael Alonge et al. Genetics. 2020 Oct.

Abstract

Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of nongap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2000 genes that were previously unplaced. We also discovered >5700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd-B1 photoperiod response locus.

Keywords: gene annotation; gene duplication; genome assembly; scaffolding; wheat.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The Triticum_aestivum_4.0 assembly scaffolding pipeline. A diagram depicting the Triticum_aestivum_4.0 (T4) assembly scaffolding pipeline, which takes the Triticum_aestivum_3.0 (T3) and IWGSC CS v1.0 (IW) assemblies as input. Gray cylinders represent input or output genome assemblies, while orange boxes show the steps of the scaffolding process.
Figure 2
Figure 2
A comparison of Triticum_aestivum_4.0 and IWGSC CS v1.0 assembly completeness. An ideogram showing the distribution of gap sequence in the Triticum_aestivum_4.0 (T4) and IWGSC CS v1.0 (IW) assemblies. The heatmap color intensity corresponds to the percentage of gap sequence in nonoverlapping 1 Mbp windows along each chromosome. Chromosomes are sorted by T4 length (left to right, top to bottom), highlighting that each T4 chromosome across all three subgenomes has more sequence and fewer gaps than its IW counterpart.
Figure 3
Figure 3
Shared assembly k-mer count distribution. Histogram of 101-mer copy number in the Triticum_aestivum_4.0 (T4) and IWGSC CS v1.0 (IW) assemblies. Only 101-mers shared by both assemblies are considered. While IW has more single-copy 101-mers, T4 represents more 101-mers at higher copy numbers.
Figure 4
Figure 4
Triticum_aestivum_4.0 resolves previously collapsed genic repeats. (A) Histogram depicting the distribution of the number of additional gene copies found in Triticum_aestivum_4.0. (B) Circos plot showing the locations of all additional gene copies (http://omgenomics.com/circa/). Lines are drawn from the location of the gene in IWGSC CS v1.0 (IW) on the right half of the diagram to the location of each copy in Triticum_aestivum_4.0 (T4) on the left half. (C) Dotplot depicting maximal exact matches (MEMs) between T4 Ppd-B1 (x-axis) and a publicly available Chinese Spring Ppd-B1 sequence (GenBank accession JF946485.1) (y-axis). Dashed lines indicate the colinear positions of four PRR genes (red labels). (D) Diagram of the MADS-box transcription factor gene, TraesCS6A02G022700, present in three additional tandem copies in T4 as relative to IW. Ideograms are not drawn to scale. (E) Plot of the short-read coverage in IW starting 5 kb upstream of TraesCS6A02G02270 and extending to the first gap downstream of the gene. The pink dashed lines show the location of the gene.

References

    1. Alonge M., Soyk S., Ramakrishnan S., Wang X., Goodwin S. et al. , 2019. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20: 224 10.1186/s13059-019-1829-6 - DOI - PMC - PubMed
    1. Alonge M., Wang X., Benoit M., Van Der Knaap E., Schatz M. C. et al. , 2020. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell 182: 145–161.e23. 10.1016/j.cell.2020.05.021 - DOI - PMC - PubMed
    1. Altschul S. F., Gish W., Miller W., Myers E. W., and Lipman D. J., 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403–410. 10.1016/S0022-2836(05)80360-2 - DOI - PubMed
    1. Appels R., Eversole K., Feuillet C., Keller B., Rogers J., et al. , 2018. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361: eaar7191 10.1126/science.aar7191 - DOI - PubMed
    1. Arumuganathan K., and Earle E. D., 1991. Nuclear DNA content of some important plant species. Plant Mol. Biol. Report. 9: 208–218. 10.1007/BF02672069 - DOI

Publication types