Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul;107(1):303-314.
doi: 10.1111/tpj.15289. Epub 2021 May 16.

Optical maps refine the bread wheat Triticum aestivum cv. Chinese Spring genome assembly

Affiliations

Optical maps refine the bread wheat Triticum aestivum cv. Chinese Spring genome assembly

Tingting Zhu et al. Plant J. 2021 Jul.

Abstract

Until recently, achieving a reference-quality genome sequence for bread wheat was long thought beyond the limits of genome sequencing and assembly technology, primarily due to the large genome size and > 80% repetitive sequence content. The release of the chromosome scale 14.5-Gb IWGSC RefSeq v1.0 genome sequence of bread wheat cv. Chinese Spring (CS) was, therefore, a milestone. Here, we used a direct label and stain (DLS) optical map of the CS genome together with a prior nick, label, repair and stain (NLRS) optical map, and sequence contigs assembled with Pacific Biosciences long reads, to refine the v1.0 assembly. Inconsistencies between the sequence and maps were reconciled and gaps were closed. Gap filling and anchoring of 279 unplaced scaffolds increased the total length of pseudomolecules by 168 Mb (excluding Ns). Positions and orientations were corrected for 233 and 354 scaffolds, respectively, representing 10% of the genome sequence. The accuracy of the remaining 90% of the assembly was validated. As a result of the increased contiguity, the numbers of transposable elements (TEs) and intact TEs have increased in IWGSC RefSeq v2.1 compared with v1.0. In total, 98% of the gene models identified in v1.0 were mapped onto this new assembly through development of a dedicated approach implemented in the MAGAAT pipeline. The numbers of high-confidence genes on pseudomolecules have increased from 105 319 to 105 534. The reconciled assembly enhances the utility of the sequence for genetic mapping, comparative genomics, gene annotation and isolation, and more general studies on the biology of wheat.

Keywords: Hi-C; direct label and stain; gene collinearity; pseudomolecule; transposable element.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interest.

Figures

Figure 1
Figure 1
Alignment of IWGSC RefSeq v1.0 with the direct label and stain (DLS) optical map. The alignment of the distal region (from 680 to 690 Mb of the pseudomolecule) of Chr1B of RefSeq v1.0 (green box) to DLS map contigs (blue boxes). Ambiguous sequences, including missing sequences (pale green), mis‐orientated scaffolds and mis‐ordered scaffolds (orange), were observed.
Figure 2
Figure 2
Overview of the strategy for reconstructing the IWGSC RefSeq v2.1 assembly. N% refers to the number of N bases placed into gaps in the assembly.
Figure 3
Figure 3
IWGSC RefSeq corrected with the direct label and stain (DLS) optical map. Alignments of the region of the Chr1B pseudomolecule corresponding to that shown in Figure 1 (green box) to the DLS map contigs (blue boxes) show that most of the ambiguous regions have been resolved in this region of the IWGSC RefSeq v2.1 assembly.
Figure 4
Figure 4
Numbers of intact LTR retrotransposons (RTs) in IWGSC RefSeq v1.0 and IWGSC RefSeq v2.1.
Figure 5
Figure 5
Alignments of a 10‐Mb interval of Chr1A pseudomolecules in IWGSC RefSeq v1.0 (a), Triticum 4.0 (b) and IWGSC RefSeq v2.1 (c) with the DLS optical contigs. Indicated are extra sequences (red arrows), missing sequences (green arrows), inverted sequences (blue arrows) and sequences from other chromosomes (orange arrow).

References

    1. Alonge, M., Shumate, A., Puiu, D., Zimin, A.V. & Salzberg, S.L. (2020) Chromosome‐scale assembly of the bread wheat genome reveals thousands of additional gene copies. Genetics, 216, 599–608. - PMC - PubMed
    1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410. - PubMed
    1. Avni, R., Nave, M., Barad, O., Baruch, K., Twardziok, S.O., Gundlach, H.et al. (2017) Wild emmer genome architecture and diversity elucidate wheat evolution and domestication. Science, 357, 93–97. - PubMed
    1. Avni, R., Nave, M., Eilam, T., Sela, H., Alekperov, C., Peleg, Z.et al. (2014) Ultra‐dense genetic map of durum wheat × wild emmer wheat developed using the 90K iSelect SNP genotyping assay. Molecular Breeding, 34, 1549–1562.
    1. Bao, W., Kojima, K.K. & Kohany, O. (2015) Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA, 6, 11. - PMC - PubMed

Publication types

MeSH terms

Substances