Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May;557(7705):424-428.
doi: 10.1038/s41586-018-0108-0. Epub 2018 May 9.

Genome sequence of the progenitor of wheat A subgenome Triticum urartu

Affiliations

Genome sequence of the progenitor of wheat A subgenome Triticum urartu

Hong-Qing Ling et al. Nature. 2018 May.

Abstract

Triticum urartu (diploid, AA) is the progenitor of the A subgenome of tetraploid (Triticum turgidum, AABB) and hexaploid (Triticum aestivum, AABBDD) wheat1,2. Genomic studies of T. urartu have been useful for investigating the structure, function and evolution of polyploid wheat genomes. Here we report the generation of a high-quality genome sequence of T. urartu by combining bacterial artificial chromosome (BAC)-by-BAC sequencing, single molecule real-time whole-genome shotgun sequencing 3 , linked reads and optical mapping4,5. We assembled seven chromosome-scale pseudomolecules and identified protein-coding genes, and we suggest a model for the evolution of T. urartu chromosomes. Comparative analyses with genomes of other grasses showed gene loss and amplification in the numbers of transposable elements in the T. urartu genome. Population genomics analysis of 147 T. urartu accessions from across the Fertile Crescent showed clustering of three groups, with differences in altitude and biostress, such as powdery mildew disease. The T. urartu genome assembly provides a valuable resource for studying genetic variation in wheat and related grasses, and promises to facilitate the discovery of genes that could be useful for wheat improvement.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Recent LTR retrotransposon bursts in the Tu genome and distribution of genomic components on Tu chromosome 1.
a, Insertion burst of LTR retrotransposons of Gypsy and Copia. TE, transposable element. bl, Multi-dimensional display of genomic components of Tu chromosome 1. b, DNA pseudomolecule. c, Gene frequency (number of genes per 10 Mb). d, Repeat density (per cent nucleotides per 5 Mb). e, Density of LTR retrotransposons (per cent nucleotides per 10 Mb). f, Frequency of lncRNA (log[number per 10 Mb]). g, Frequency of segmentally duplicated genes (log[number per 1 Mb]). h, Frequency of tandemly duplicated genes (log[number per 1 Mb]). i, Frequency of simple sequence repeats (log[number per 10 Mb]). j, Linkage map distance (cM per 5 Mb). k, Accumulated gene expression level (log2[FPKM (fragments per kilobase of transcript per million mapped reads) per 5 Mb]). l, GC content (per cent per 1 Mb).
Fig. 2
Fig. 2. Genome synteny to bread wheat (Ta) and an evolutionary model of the Tu chromosomes.
a, b, Synteny of Tu chromosomes with subgenomes A and B of Ta. Each line represents a syntenic block of five or more gene pairs with similarity of 80% or more. Three large structural variations detected are: (1) a reciprocal translocation at the distal end of the long arms between Tu4 and Tu5 that occurred before the polyploidization of the A and B genomes, but after divergence from both B and D genomes; (2) a one-way translocation from Ta7B to Ta4A; and (3) a pericentric inversion on Ta4A involving most of the long and short arms. c, Evolutionary model of Tu chromosomes from an ancestral grass genome based on the AGK structure initially defined in Murat et al. and the syntenic relationships of Tu with B. distachyon (Bd), rice (Os), and sorghum (Sb). One-directional arrows indicate segment translocations, and bidirectional arrows indicate inversions. Tu1–Tu7, seven chromosomes of Tu; A1–A12, twelve chromosomes of the grass ancestor; Bd1–Bd5, five chromosomes of Bd; Os1–Os12, twelve rice chromosomes; Sb1–Sb10, ten sorghum chromosomes. The seven coloured squares on the right represent seven basic ancient grass chromosomes. The line graphs below Tu chromosomes display the frequency distribution of AGK genes. The red and blue arrows indicate inter- and intra-chromosome fusion locations, respectively, of the ancestral chromosomes in the Tu genome.
Fig. 3
Fig. 3. Geographic distribution and population structure of Tu.
a, Distribution of the 147 Tu accessions at different altitudes of the Fertile Crescent. Also shown are the main land markers, including the Euphrates and Tigris rivers, Urmia lake and 11 cities (Adana (Ad), Aleppo (Al), Baghdad (Ba), Beirut (Be), Damascus (D), Erzurum (E), Homs (H), Mosul (M), Tabriz (Ta), Tripoli (Tr) and Yerevan (Y)). The map was drawn using the online mapping tool ArcGIS (version 10.1, www.esri.com). b, Phylogenetic clustering of the 147 accessions into three groups, with B. distachyon as the outgroup. c, Population structure analysis of Tu accessions, clustering with three groups that are similar to those from the phylogenetic analysis shown in b.
Extended Data Fig. 1
Extended Data Fig. 1. T. urartu genome assembly.
a, Schematic workflow for genome sequencing, assembly and chromosomal assignment with a high-density SNP map. b, Statistics of sequencing data. *Calculated from the estimated genome size of 4.94 Gb. LIS, library insert size; ARL, average read length; RD, raw data; UD, usable data; ED, effective depth. c, Summary of the Tu genome assembly. *Contig: contiguous sequence without Ns assembled with Illumina reads, corrected by PacBio reads. #Scaffold: Sequence with Ns, in which two or more contigs were connected by mate-pair reads, BioNano genome maps and 10× Genomics linked reads. d, High-resolution genetic map of T. urartu using SNP markers. The SNP markers were identified from an F2 population (475 individuals) derived from a cross between accessions G1812 and G3146 of Tu. e, Summary of physical length and genetic map of seven pseudomolecules. Chr, chromosome; GLDAccu, genetic linkage distance accumulation; GLDAvg, genetic linkage distance on average; PDPCM, physical distance per centiMorgen (cM).
Extended Data Fig. 2
Extended Data Fig. 2. Evaluation of T. urartu genome assembly.
a, Summary of comparison of the T. urartu genome assembly with public BAC sequences. *The published chromosome location of BACs; NA, not available. #Tu draft assembly. b, Dot plots showing comparison of T. urartu genome with available BACs of T. urartu from public database. The blue arrows indicate the regions at which the BAC sequences and pseudomolecules did not match owing to the presence of repeat elements.
Extended Data Fig. 3
Extended Data Fig. 3. Chromosomal distribution of T. urartu genome features.
af, Features on Tu2–Tu7 are in the order of DNA pseudomolecule; gene frequency (number of genes per 10 Mb); repeat density (per cent nucleotides per 5 Mb); density of LTR retrotransposons (per cent nucleotides per 10 Mb); frequency of lncRNA (log[number of genes per 10 Mb]); frequency of segmentally duplicated genes (log[number of genes per 1 Mb]); frequency of tandemly duplicated genes (log[number of genes per 1 Mb]); frequency of simple sequence repeats (log[number of repeats per 10 Mb]); linkage map distance (cM per 5 Mb); accumulated gene expression level in RNA-seq data (log2[FPKM per 5 Mb]); GC content (per cent per 1 Mb).
Extended Data Fig. 4
Extended Data Fig. 4. Analyses of gene families and B3 transcription factors.
a, Comparison of gene families of T. urartu with O. sativa, Z. mays, S. bicolor and B. distachyon. Venn diagram illustrates shared and unique gene families (gene numbers in parentheses) among the five grass species. b, Gene ontology analysis of Tu-specific genes. Tu-specific InCat#, number of Tu-specific genes in the GO category; Total InCat#, number of total Tu genes in the GO category. c, Comparison of B3 transcription factors of Tu with B. distachyon (Bd), O. sativa (Os), S. bicolor (Sb), Z. mays (Zm), Ae. tauschii (Aet) and T. aestivum (Ta). *B3 transcription factors without identified subfamily.
Extended Data Fig. 5
Extended Data Fig. 5. Comparison of Tu genome with other wheat genomes.
a, Syntenic analysis of Tu genome with the D subgenome of Ta, and the genome of Aet. Each syntenic block contains five or more genes, with sequence similarity of 80% or more. b, Comparison of Tu genome with BACs of T. turgidum (Tt) and Ta. BAC KF282630 from chromosome 4 of the A subgenome of Tt contained two inserted fragments (blue arrows) that are composed of Copia RLC_WIS_B elements, which were not detected on the corresponding Tu4 region. BAC JQ354543 from chromosome 3 of the A subgenome of Ta lacked the Gypsy RLG_Jeli element (blue arrow), which was found on the corresponding region of Tu3. c, Comparison of the Tu genome with Ta7A scaffolds from TGACv1. The dot plots on the top show comparison of two largest TaA scaffolds to corresponding parts of Tu7 chromosome. The diagonal lines on the dot plots show fine co-linearity. The lower part shows validation of the sequence assembly of Tu7 by BioNano maps. The Tu7 sequences were digested into in silico consensus maps, and the consensus maps corresponding to the two Ta7A scaffolds (green bar) are compared against their corresponding BioNano genome maps (blue bar). Each vertical line on the green/blue bars indicates a restriction enzyme cutting site (Nt.BspQI), and vertical lines between green bars and blue bars indicate alignments among these sites. The blue bars highlighted with red vertical line demonstrate that two BioNano genome maps (for example, 3912 and 6183) overlap with one another (they all have alignments on the overlapping region), although the two maps are not merged together owing to lack of coverage on the overlapping region. The high consistency of alignments between consensus maps and BioNano genome maps confirm the high quality of Tu genome assembly. Therefore, the insertion/deletion events in the dot plots should be sequence variations between the two A genomes from Ta and Tu, rather than assembly errors. d, Comparison of Tu7 with all Ta7A scaffolds from TGACv1 at nucleotide levels. *Nucleotide: minimum cutoff of DNA sequence alignments between Tu7 and Ta7A. ML, minimum length (bp) to align.
Extended Data Fig. 6
Extended Data Fig. 6. Synteny of T. urartu genome with other grass genomes.
a, Intergenomic dot plots showing orthologous syntenies between Tu and three grass relatives B. distachyon (Bd), O. sativa (Os) and S. bicolor (Sb). The alignment of the Tu genome to the Bd, Os and Sb genomes demonstrates their highly collinear relationships and full coverage of Tu by orthologous chromosome segments from each of the three grass genomes. b, Data table showing the orthologous syntenies between Tu and the other genomes (Bd, Os and Sb).
Extended Data Fig. 7
Extended Data Fig. 7. Intragenomic collinear regions of Tu corresponding to rice (Os) duplications.
a, Intragenomic and intergenomic dot plots show a clearly visible collinear region 1′ between Tu1 and Tu3 that is orthologous to the Os intragenomic collinear region 1. However, the collinearity of region 2′ between Tu1 and Tu3 is severely corrupted, but the corresponding Os intragenomic collinear region 2 was clearly visible. be, Similarly, Tu intragenomic collinear regions 10′, 3′, 4′ and 5′ were also clearly visible, which correspond to Os intragenomic collinear regions 10, 3, 4 and 5, respectively. However, the collinearity of Tu intragenomic regions corresponding to Os intragenomic collinear regions 11, 6, 7, 8 and 9 was disrupted. f, Data table showing the strong intragenomic collinear segments of Os and their corresponding regions of Tu (chromosomal positions are shown in Mb). *The corresponding seven pairs of ancient chromosomes based on the AGK structure that are ancestral to the Tu and Os chromosomes listed in the right columns. The five clearly visible regions in Tu are marked in bold. **Os regions without visible corresponding collinear regions in Tu dot plots. g, Data table showing the number of collinear genes between ancestrally duplicated chromosome segments in rice and their corresponding gene numbers in Tu.
Extended Data Fig. 8
Extended Data Fig. 8. Comparison of chromosome 3 of Tu with chromosome 3B of Ta.
a, Comparison of Tu3 with Ta3B at both nucleotide and protein levels. *Nucleotide: minimum cutoff of DNA sequence alignments between Tu3 and Ta3B. ML, minimum length (bp) to align. Protein: minimum cutoff of protein sequence alignments between Tu3 and Ta3B, and in reverse. b, Overall view of syntenic blocks between Tu3 and Ta3B. c, A syntenic block composed of five consecutive collinear gene pairs, showing large repeat insertions on Ta3B of >100 kb. d, A syntenic block composed of seven consecutive collinear gene pairs. A 70-kb contraction is seen in Ta3B. e, A syntenic block with eight collinear genes interrupted by non-syntenic genes. f, Two segments of Tu3 and Ta3B show five discrete collinear gene pairs and gene expansions in Tu3. g, Genome expansion in one representative syntenic block from Bd2 (0.3 Mb), Tu3 (4.4 Mb), and Ta3B (11.2 Mb). Compared with the Tu3 segment, large numbers of non-homologous genes and repeats can be observed in Ta3B, resulting in a 7-Mb expansion. h, Insertion dates of LTR retrotransposons on Tu3 and Ta3B. A recent retrotransposon burst at around 0.1 Ma is observed on Ta3B but not on Tu3.
Extended Data Fig. 9
Extended Data Fig. 9. Population analysis.
a, Distribution of transcriptomic-based SNPs on the seven DNA pseudomolecules (Tu1–Tu7) of T. urartu. The SNPs were calculated using a 1-Mb window. b, Boxplot comparison of altitude ranges of Tu accessions. The number of accessions in each group is indicated in parentheses. The line inside each box represents the median, the ends of each box define the 25th and 75th percentiles, and the error bars mark the 10th and 90th percentiles. Outliers are displayed as open circles. c, Analysis of genetic diversity and differentiation. The π, θ and FST values were estimated for the three groups of Tu accessions using transcriptomic SNPs. d, Reaction phenotypes of Tu accessions to the wheat powdery mildew fungus Bgt. Most of the accessions in Groups I and III (96.7% and 90.6%, respectively) were susceptible to Bgt (race E09), whereas the majority of Group II accessions (92.2%) showed resistance. e, A total of 141 (top 1%; πGroup I/πGroup II > 7.7) and 143 (top 1%; πGroup III/πGroup II > 4.3) signals were considered to be candidate sweeps (dots above the dashed horizontal threshold line). The Tu accessions in Groups I, II and III were 30, 64 and 53, respectively. Blue arrows indicate the wall-associated receptor protein kinase gene (TuWAK, TuG1812G0400002796), whose haplotype variations showed strong associations with resistance or susceptibility to Bgt. f, Exon (box)–intron (line) structure of TuWAK and its two major haplotypes (Hap1 and Hap2). The positions of the three SNPs that differ between Hap1 and Hap2 are shown. Amino acid changes caused by these SNPs are also displayed. g, Distribution of Hap1 and Hap2 in the three groups of T. urartu accessions. Hap1 was the major haplotype in Group II, which was strongly associated with resistance to Bgt, while Hap2 was the main haplotype in Groups I and III, and associated with susceptibility to Bgt. In d and g, the accessions in each group were further sorted based on the response to powdery mildew infection (d) or the possession of TuWAK haplotypes (g), with the assorted accession numbers indicated in appropriate columns.

Comment in

  • Optimise wheat A-genome.
    Lyu J. Lyu J. Nat Plants. 2018 Jun;4(6):320. doi: 10.1038/s41477-018-0183-0. Nat Plants. 2018. PMID: 29808021 No abstract available.

References

    1. Dvorák J, Terlizzi P, Zhang HB, Resta P. The evolution of polyploid wheats: identification of the A genome donor species. Genome. 1993;36:21–31. doi: 10.1139/g93-004. - DOI - PubMed
    1. Peng JH, Sun DH, Nevo E. Domestication evolution, genetics and genomics in wheat. Mol. Breed. 2011;28:281–301. doi: 10.1007/s11032-011-9608-4. - DOI
    1. Ferrarini M, et al. An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome. BMC Genomics. 2013;14:670. doi: 10.1186/1471-2164-14-670. - DOI - PMC - PubMed
    1. Zheng GX, et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 2016;34:303–311. doi: 10.1038/nbt.3432. - DOI - PMC - PubMed
    1. Lam ET, et al. Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat. Biotechnol. 2012;30:771–776. doi: 10.1038/nbt.2303. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources