Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Building the sugarcane genome for biotechnology and identifying evolutionary trends

Nathalia de Setta et al. BMC Genomics. .

Abstract

Background: Sugarcane is the source of sugar in all tropical and subtropical countries and is becoming increasingly important for bio-based fuels. However, its large (10 Gb), polyploid, complex genome has hindered genome based breeding efforts. Here we release the largest and most diverse set of sugarcane genome sequences to date, as part of an on-going initiative to provide a sugarcane genomic information resource, with the ultimate goal of producing a gold standard genome.

Results: Three hundred and seventeen chiefly euchromatic BACs were sequenced. A reference set of one thousand four hundred manually-annotated protein-coding genes was generated. A small RNA collection and a RNA-seq library were used to explore expression patterns and the sRNA landscape. In the sucrose and starch metabolism pathway, 16 non-redundant enzyme-encoding genes were identified. One of the sucrose pathway genes, sucrose-6-phosphate phosphohydrolase, is duplicated in sugarcane and sorghum, but not in rice and maize. A diversity analysis of the s6pp duplication region revealed haplotype-structured sequence composition. Examination of hom(e)ologous loci indicate both sequence structural and sRNA landscape variation. A synteny analysis shows that the sugarcane genome has expanded relative to the sorghum genome, largely due to the presence of transposable elements and uncharacterized intergenic and intronic sequences.

Conclusion: This release of sugarcane genomic sequences will advance our understanding of sugarcane genetics and contribute to the development of molecular tools for breeding purposes and gene discovery.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representation of the sucrose, cellulose and starch metabolic pathways, showing genes identified with supporting RNA-seq mapping information. The grey boxes represent enzyme products. The arrows represent enzyme reactions, solid arrows are enzyme reactions where the predicted enzyme-coding genes were identified in sugarcane, dotted arrows where the gene was not identified. EC numbers are shown for the predicted enzyme-coding genes identified. EC numbers in red indicate predicted enzyme-coding genes that were mapped with more than a thousand RNA-seq reads. The number of mRNA reads mapped is indicated in parentheses below the EC number. If more than one BAC to a single Sorghum loci was sequenced, the minimum and the maximum number of reads mapped to all BACs are shown. EC 3.2.1.26: beta-fructofuranosidase, 3.1.3.24: sucrose-6-phosphate phosphohydrolase, EC 2.4.1.14: sucrose phosphate synthase, EC 2.4.1.12: cellulose synthase, EC 2.4.1.13: sucrose synthase, EC 2.7.7.9: UDP glucose pyrophosphorylase, EC 3.2.1.37: xylan 1,4-beta-xylosidase, EC 3.1.1.11: pectinesterase, EC 3.2.1.15: polygalacturonase, EC 5.4.2.2: phosphoglucomutase, EC 2.7.7.27: glucose-1-phosphate adenylyltransferase, EC 2.4.1.21: starch synthase, EC 2.4.1.18: 1,4-alpha-glucan branching enzyme and EC 3.2.1.2: beta-amylase.
Figure 2
Figure 2
Network analysis of the s6pp gene duplication region. The network was constructed using the NETWORK 4.5.1.0 software [84] with default parameters. From a 1,539 bp alignment, 262 variable characters were used to reconstruct the network. The main figure is a closeup of part of the entire network which shown in the top left. The size of the circle is relative to the number of sequences in that haplotype. Thick bold circles represent the five main haplotypes. A single dash denotes a single substitution; the distance between clusters is therefore proportional to the number of substitutions. Numbers between parentheses in the legend show the number of cultivars analyzed.
Figure 3
Figure 3
Structure of the SsMIR437a gene identified in BAC SCHRBa_095_E16. The double-arrowed solid black line shows the location within the BAC, the numbers indicate the nucleotide positions within the BAC. The number along the blue line show the position within the region. Exons are shown as blue bars, TEs as grey bars, the intron as a dashed blue line and the putative source of miRNA mature sequence as a solid red line. TEs were identified using RepeatMasker (cut-off score > 250). The miRNA mature sequence is AAAGUUAGAGAAGUUUGACUU.
Figure 4
Figure 4
Heatmap of the distribution of sequenced sugarcane BACs on sorghum chromosomes. The depth of the blue colour indicates the number of BACs localized per 10 Mb. Horizontal red lines show the location of BACs selected using probes based on eight linkage groups [7]. Horizontal black lines show the location of BACs that overlap with at least one gene. Numbers above the black bars indicate the number of BACs that overlap at that point.
Figure 5
Figure 5
Physical and functional relationships of rpa1a sugarcane hom(e)ologous BACs compared to sorghum. The rpa1a genes are represented by white arrows and other genes by black arrows. LTR retrotransposons are represented by blue boxes, DNA transposons by brown boxes and Harbinger transposons by black vertical lines. Only contiguous TE sequences greater than 3,000 bp are shown. A. A physical and phylogenetic analysis of the genomic region of the rpa1a gene from 12 BACs and S. bicolor. The neighbor-joining tree was inferred with using the highest ranked substitution model (Tajima-Nei) and 1000 bootstrap replications [72]. The Arabic numberals are bootstrap values, roman numerals indicate the three phylogenetic groups identified. Colinear genes and TEs are connected by shaded areas. B. Mapping of sRNA and mRNA libraries against one BAC from each phylogenetic group (I, II and III) and SHCRBa_035_B09 and SHCRBa_196_O13. Both sRNA and mRNA mappings are to scale. Dotted ovals indicate sRNA and mRNA peaks discussed in the text. Y axis show the mRNA and sRNA mapping density, that correspond to the proportion of mRNA or sRNA mapping in each base, normalized by the BAC size.

References

    1. European Commission: Agriculture and Rural Development: Sugarhttp://ec.europa.eu/agriculture/sugar/index_en.htm
    1. Kellogg EA. Evolutionary history of the grasses. Plant Physiol. 2001;125:1198–1205. doi: 10.1104/pp.125.3.1198. - DOI - PMC - PubMed
    1. Grivet L, Arruda P. Sugarcane genomics: depicting the complex genome of an important tropical crop. Curr Opin Plant Biol. 2001;5:122–127. doi: 10.1016/S1369-5266(02)00234-0. - DOI - PubMed
    1. Piperidis G, Piperidis N, D’Hont A. Molecular cytogenetic investigation of chromosome composition and transmission in sugarcane. Mol Genet Genomics. 2010;284:65–73. doi: 10.1007/s00438-010-0546-3. - DOI - PubMed
    1. D’Hont A. Unraveling the genome structure of polyploids using FISH and GISH; examples of sugarcane and banana. Cytogenet Genome Res. 2005;109:27–33. doi: 10.1159/000082378. - DOI - PubMed

Publication types

Associated data