Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;8(2):e57686.
doi: 10.1371/journal.pone.0057686. Epub 2013 Feb 28.

De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.)

Affiliations

De novo assembly, gene annotation and marker development using Illumina paired-end transcriptome sequences in celery (Apium graveolens L.)

Nan Fu et al. PLoS One. 2013.

Abstract

Background: Celery is an increasing popular vegetable species, but limited transcriptome and genomic data hinder the research to it. In addition, a lack of celery molecular markers limits the process of molecular genetic breeding. High-throughput transcriptome sequencing is an efficient method to generate a large transcriptome sequence dataset for gene discovery, molecular marker development and marker-assisted selection breeding.

Principal findings: Celery transcriptomes from four tissues were sequenced using Illumina paired-end sequencing technology. De novo assembling was performed to generate a collection of 42,280 unigenes (average length of 502.6 bp) that represent the first transcriptome of the species. 78.43% and 48.93% of the unigenes had significant similarity with proteins in the National Center for Biotechnology Information (NCBI) non-redundant protein database (Nr) and Swiss-Prot database respectively, and 10,473 (24.77%) unigenes were assigned to Clusters of Orthologous Groups (COG). 21,126 (49.97%) unigenes harboring Interpro domains were annotated, in which 15,409 (36.45%) were assigned to Gene Ontology(GO) categories. Additionally, 7,478 unigenes were mapped onto 228 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). Large numbers of simple sequence repeats (SSRs) were indentified, and then the rate of successful amplication and polymorphism were investigated among 31 celery accessions.

Conclusions: This study demonstrates the feasibility of generating a large scale of sequence information by Illumina paired-end sequencing and efficient assembling. Our results provide a valuable resource for celery research. The developed molecular markers are the foundation of further genetic linkage analysis and gene localization, and they will be essential to accelerate the process of breeding.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Length distribution of the celery unigenes de novo assembled from 42280 ESTs.
Figure 2
Figure 2. Comparison of unigene length between hit and no hit unigenes.
Longer contigs were more likely to have BLASTx homologs in protein database.
Figure 3
Figure 3. Characteristics of similarity search of unigenes against Nr and Swiss-Prot databases.
(A) E-value distribution of BLAST hits for each unigene with a cutoff E-value of 1E-5 in the Nr database. (B) E-value distribution of BLAST hits for each unigene with a cutoff E-value of 1E-5 in the Swiss-Prot database. (C) Similarity distribution of the top BLAST hits for each unigene in Nr database. (D) Similarity distribution of the top BLAST hits for each unigene in Swiss-Prot database.
Figure 4
Figure 4. Gene Ontology classifications of assembled unigenes.
The unigenes are summarized into three main categories: biological process, cellular location, and molecular function. In total, 15,409 unigenes with BLASTx matches were assigned to gene ontologies.
Figure 5
Figure 5. Clusters of orthologous groups (COG) classification.
In total, 10,473 sequences were grouped into 24 COG classifications.
Figure 6
Figure 6. Pathway assignment based on KEGG.
(A) Classification based on metabolism categories; (B) Categories classified by KEGG.
Figure 7
Figure 7. Frequency distribution of SSRs based on motif types.
The GA/TC di-nucleotide repeat motif was the most abundant motif detected.
Figure 8
Figure 8. Similarity relationships of 31 different accessions of A. graveolens based on 28 EST-SSR loci.
LC: local celery; C: celery.

References

    1. Muminovic J, Melchinger AE, Lubberstedt T (2004) Prospects for celeriac (Apium graveolens var. rapaceum) improvement by using genetic resources of Apium, as determined by AFLP markers and morphological characterization. Plant Genetic Resources 2: 189–198.
    1. Sampson S (2006) Chinese celery has robust taste. Toronto Star (Canada) ISSN:0319–0781.
    1. Song XJ, Wang YT (2008) Research Progress in the Medicinal Function of Celery. Journal of Anhui Agricultural Sciences 36: 6360–6361, 6395.
    1. TANG F, GUO J, ZHANG J, LI J, SU M (2007) Study on Hypotensive and Vasodilatory Effects of Celery Juice. Food Science 28: 322–325.
    1. Arus P, Ortan T (1984) Inheritance patterns and linkage relationships of eight genes of celery. Journal of Heredity 75: 11–14.

Publication types

MeSH terms

Substances