Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 20:6:19427.
doi: 10.1038/srep19427.

The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny

Affiliations

The genome sequence of the outbreeding globe artichoke constructed de novo incorporating a phase-aware low-pass sequencing strategy of F1 progeny

Davide Scaglione et al. Sci Rep. .

Erratum in

Abstract

Globe artichoke (Cynara cardunculus var. scolymus) is an out-crossing, perennial, multi-use crop species that is grown worldwide and belongs to the Compositae, one of the most successful Angiosperm families. We describe the first genome sequence of globe artichoke. The assembly, comprising of 13,588 scaffolds covering 725 of the 1,084 Mb genome, was generated using ~133-fold Illumina sequencing data and encodes 26,889 predicted genes. Re-sequencing (30×) of globe artichoke and cultivated cardoon (C. cardunculus var. altilis) parental genotypes and low-coverage (0.5 to 1×) genotyping-by-sequencing of 163 F1 individuals resulted in 73% of the assembled genome being anchored in 2,178 genetic bins ordered along 17 chromosomal pseudomolecules. This was achieved using a novel pipeline, SOILoCo (Scaffold Ordering by Imputation with Low Coverage), to detect heterozygous regions and assign parental haplotypes with low sequencing read depth and of unknown phase. SOILoCo provides a powerful tool for de novo genome analysis of outcrossing species. Our data will enable genome-scale analyses of evolutionary processes among crops, weeds, and wild species within and beyond the Compositae, and will facilitate the identification of economically important genes from related species.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Workflow schema for the SoiLoCo pipeline used to anchor the C. cardunculus scaffolds in chromosomal pseudomolecules.
Alignments of parental reads to the draft scaffolds were used to (i) identify potential heterozygous test-cross sites and (ii) to compute haplotype phases in both parents (P1 and P2). A multi-sample VCF file of all the progeny was then processed to identify informative heterozygous sites based on parental SNPs and the phase of haplotype blocks (TC-separator.pl, blue box). This assigned the sites according to which phase (i.e. homologous chromosomes) they are expected to segregate in. Subsequently, an HMM-based algorithm was used to impute the most likely genotypes of each haplotype block segregating in the progeny (gt-hmm.pl, red box). A LOD score was also calculated to permit filtering of ambiguous imputations. Genotype imputation from the two alternative segregating phases were then summarized; when there was a discordant call between phases, a majority rule was applied and the highest LOD score for each segregating haplotype used to impute the most likely genotype (TC-string-merger.pl, green box). After grouping markers, linkage maps were generated for each parent using reiterative ordering with the MSTmap software (http://alumni.cs.ucr.edu/~yonghui/mstmap.html) and error correction using a Perl implementation of SMOOTH algorithm. Maps were finally merged to generate a consensus map and to maximize the resolution of the order and orientation of scaffolds in chromosomal pseudomolecules.
Figure 2
Figure 2. Characteristics of the globe artichoke genome.
(a) Heat map of repeat density in the reference genome generated from globe artichoke 2C (blue, low; orange, high density); (b) Heat map of gene density in the reference genome; (c) density of heterozygous SNPs in the globe artichoke parental genotype, C3; (d) density of heterozygous SNPs in the cultivated cardoon parental genotype, Alt41; (e) extended homozygous regions in cultivated cardoon: the color of the boxes indicate whether they occur in repeat-rich regions (green) or gene-rich regions (red).
Figure 3
Figure 3. Anchoring the globe artichoke reference genome to the genetic maps and analysis robustness for each map.
(a) Alignments of parental maps with the inferred consensus map in the middle for the first three C. cardunculus linkage groups. (b) Distribution of sizes (x axis) and genetic distances (y axis) for every possible pair of scaffolds separated by up to 5 cM when comparing the two parental maps. Pairs are shown as concordant (left) if their relative genetic position is maintained in both maps; discordant (right) if it is not. Scaffold size is reported as the smaller size of the two scaffolds in a pair (size and color of dots emphasize the scaffold size).
Figure 4
Figure 4. Venn diagram of orthologous gene clusters among Arabidopsis thaliana
(a), Cynara cardunculus (c), Fragaria vesca (f) and Solanum lycopersicum (s), showing a total of 9,246 common gene clusters.
Figure 5
Figure 5. Repeat identification and dating in the genome of C. cardunculus and miRNA analysis.
(a) Overall distribution of the repetitive fraction. (b) Distributions of insertion ages in different families of LTR elements. (c) Number of loci per miRNA considered as four frequency classes: (i) <10 loci; (ii) 10 to 99 loci; (iii) 100 to 999 loci; (iv) >1000 loci. (d) Distribution of Mixed, RR and NRR classes of miRNA within conserved miRNAs. (e) Distribution of Mixed, RR and NRR classes of miRNA within globe artichoke-specific miRNAs. RR = miRNAs represented exclusively by loci covered by repeat elements; NRR = miRNAs without any locus covered by repeat elements; Mixed = miRNAs which include both loci covered and not covered by repeats. Conserved miRNAs are present in at least two of the 11 tested species.
Figure 6
Figure 6. Distribution of synonymous substitutions across paralogs and orthologs as defined through conservation of syntenic blocks.
Lettuce and globe artichoke speciation is described with a Ks peak of ~0.55, while ancestral common genome duplication predating the Euasterids II radiation is described by a Ks ~ 0.78 for globe artichoke and ~0.87 for lettuce prior to heterogeneity rate correction, confirming that no WGD occurred after speciation of Chicoriae and Cardueae.

References

    1. Portis E., Barchi L., Acquadro A., Macua J. & Lanteri S. Genetic diversity assessment in cultivated cardoon by AFLP (amplified fragment length polymorphism) and microsatellite markers. Plant Breed. 124, 299–304 (2005).
    1. Mauro R. et al. Genetic diversity of globe artichoke landraces from Sicilian small-holdings: implications for evolution and domestication of the species. Cons. Genet. 10, 431–440 (2009).
    1. Comino C. et al. The isolation and mapping of a novel hydroxycinnamoyltransferase in the globe artichoke chlorogenic acid pathway. BMC Plant Biol. 9, 30 (2009). - PMC - PubMed
    1. Lattanzio V., Kroon P., Linsalata V. & Cardinali A. Globe artichoke: A functional food and source of nutraceutical ingredients. J. Funct. Foods 1, 131–144 (2009).
    1. Eljounaidi K. et al. Cytochrome P450s from Cynara cardunculus L. CYP71AV9 and CYP71BL5, catalyze distinct hydroxylations in the sesquiterpene lactone biosynthetic pathway. Plant Sci. 223, 59–68 (2014). - PubMed

Publication types