Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 1;8(6):giz072.
doi: 10.1093/gigascience/giz072.

A chromosome-scale genome assembly of cucumber (Cucumis sativus L.)

Affiliations

A chromosome-scale genome assembly of cucumber (Cucumis sativus L.)

Qing Li et al. Gigascience. .

Abstract

Background: Accurate and complete reference genome assemblies are fundamental for biological research. Cucumber is an important vegetable crop and model system for sex determination and vascular biology. Low-coverage Sanger sequences and high-coverage short Illumina sequences have been used to assemble draft cucumber genomes, but the incompleteness and low quality of these genomes limit their use in comparative genomics and genetic research. A high-quality and complete cucumber genome assembly is therefore essential.

Findings: We assembled single-molecule real-time (SMRT) long reads to generate an improved cucumber reference genome. This version contains 174 contigs with a total length of 226.2 Mb and an N50 of 8.9 Mb, and provides 29.0 Mb more sequence data than previous versions. Using 10X Genomics and high-throughput chromosome conformation capture (Hi-C) data, 89 contigs (∼211.0 Mb) were directly linked into 7 pseudo-chromosome sequences. The newly assembled regions show much higher guanine-cytosine or adenine-thymine content than found previously, which is likely to have been inaccessible to Illumina sequencing. The new assembly contains 1,374 full-length long terminal retrotransposons and 1,078 novel genes including 239 tandemly duplicated genes. For example, we found 4 tandemly duplicated tyrosylprotein sulfotransferases, in contrast to the single copy of the gene found previously and in most other plants.

Conclusion: This high-quality genome presents novel features of the cucumber genome and will serve as a valuable resource for genetic research in cucumber and plant comparative genomics.

Keywords: Hi-C; PacBio; chromosome-scale assembly; cucumber; genomics.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Landscape of the 7 pseudo-chromosome (chr) sequences. All included contigs are shown. Cytogenetic map [22] is integrated with the sequences. Arrows mark positions of the centromeres (Cen). The distribution of satellite and repetitive sequences along the contigs is illustrated below. Fosmid clones are marked in green and red on the 7 chromosomes, and the imaginary lines connect the physical locations and approximate locations of assembled chromosomes.
Figure 2:
Figure 2:
Correlation of genome assembly with genetic maps and Hi-C data. A, Integrated genetic and physical maps of the cucumber genome assembly. Super-scaffolds of the genome assembly (middle) were anchored to the 4 linkage groups (left and right): map.1 (green) [3], map.2 (orange) [21], map.3 (light blue) [20], map.4 (pink) [19]. B, Heat map of Hi-C contact information. Pixel colors represent different normalized counts of Hi-C links between 30-kb non-overlapping windows for all 7 chromosomes (chr) on a logarithmic scale.
Figure 3
Figure 3
Novel repetitive sequences and genes in assembly v3.0. A, Sizes of various types of repetitive sequences in the v2.0 and v3.0 assemblies. DNA, DNA transposons; LINE, Long interspersed nuclear elements; SINE, Short interspersed nuclear elements; LTRc, Copia long terminal repeat retrotransposons; LTRg, Gypsy long terminal repeat retrotransposons; LTRo, Other LTR categories; Unknown, unknown type. B, The number of full-length long terminal retrotransposons (FL-LTRs) in v2.0 and v3.0. C, A newly predicted FL-LTR in v3.0. TSR, Target site repeat; PBS, Primer bingding site; PPT, Primer polypurine tract; IN, Intergrase; RT, Reverse transcriptase. D, An example showing the newly assembled multiple tyrosylprotein sulfotransferase (TPST) genes in v3.0. b'-e' are all TPST genes, corresponding to CsaV3_1G013960, CsaV3_1G013970, CsaV3_1G013980 and CsaV3_1G013990, respectively.
Figure 4:
Figure 4:
Distribution of GC content for the whole genome and novel sequences in v3.0.

References

    1. Woycicki R, Witkowicz J, Gawronski P, et al. .. The genome sequence of the North-European cucumber (Cucumis sativus L.) unravels evolutionary adaptation mechanisms in plants. PLoS One. 2011;6:e22728. - PMC - PubMed
    1. Li Z, Zhang Z, Yan P, et al. .. RNA-Seq improves annotation of protein-coding genes in the cucumber genome. BMC Genomics. 2011;12:540. - PMC - PubMed
    1. Yang L, Koo D, Li Y, et al. .. Chromosome rearrangements during domestication of cucumber as revealed by high-density genetic mapping and draft genome assembly. Plant J. 2012;71:895–906. - PubMed
    1. Huang S, Li R, Zhang Z, et al. .. The genome of the cucumber, Cucumis sativus L. Nat Genet. 2009;41:1275–81. - PubMed
    1. Huang S, Li R, Zhang Z, et al. .. Genomic data for the domestic cucumber (Cucumis sativus var. sativus L.). GigaScience Database. 2011. 10.5524/100025. - DOI

Publication types

LinkOut - more resources