Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Oct 30:13:571.
doi: 10.1186/1471-2164-13-571.

De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

Affiliations

De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

Hamid Ashrafi et al. BMC Genomics. .

Abstract

Background: Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes.

Results: Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80-120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins.

Conclusions: Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project.

PubMed Disclaimer

Figures

Figure 1
Figure 1
a) Distribution of contigs length in a) pepper Sanger-EST assembly b) distribution of contigs length in pepper IGA transcriptome assembly.
Figure 2
Figure 2
Distribution of Blast2GO three-step processes including BLASTX, mapping and annotation of for a) Sanger-EST assembly and b) IGA transcriptome assembly.
Figure 3
Figure 3
a) Species distribution by accounting all BLASTX hits in the Sanger-EST assembly b) Top-hit species distribution based on BLASTX alignments in the Sanger-EST assembly. c) Species distribution by accounting all BLASTX hits in the transcriptome assembly d) Top-hit species distribution based on BLASTX alignments in the IGA transcriptome assembly. Cultivated Solanum species are more frequent than wild type species (S. habrochaites or S. bulbocastanum). Within Capsicum species, there are more hits to C. annuum than C. chinense or other distantly related capsicum species such as C. chacoense.
Figure 4
Figure 4
An instance of a KEGG map for Pyrimidine metabolism pathway. Each box represents the enzyme code involved in each section of the pathway. The colored boxes are depicting identified enzymes by a) Sanger-EST assembly and b) transcriptome assembly. The KEGG files can be downloaded from Pepper GeneChip website (https://pepper.ucdavis.edu).

References

    1. Bosland PW, Votova EJ. Peppers: Vegtable and spice capsicums. New York: CABI; 2000.
    1. Paran I, van der Voort JR, Lefebvre V, Jahn M, Landry L, van Schriek M, Tanyolac B, Caranta C, Chaim AB, Livingstone K. et al. An integrated genetic linkage map of pepper (Capsicum spp.) Mol Breed. 2004;13(3):251–261.
    1. Lefebvre V, Pflieger S, Thabuis A, Caranta C, Blattes A, Chauvet J-C, Daubèze A-M, Palloix A. Towards the saturation of the pepper linkage map by alignment of three intraspecific maps including known-function genes. Genome. 2002;45:839–854. doi: 10.1139/g02-053. - DOI - PubMed
    1. Wu F, Eannetta N, Xu Y, Durrett R, Mazourek M, Jahn M, Tanksley S. A COSII genetic map of the pepper genome provides a detailed picture of synteny with tomato and new insights into recent chromosome evolution in the genus Capsicum. TAG Theor Appl Genet. 2009;118(7):1279–1293. doi: 10.1007/s00122-009-0980-y. - DOI - PubMed
    1. Rafalski A. Applications of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol. 2002;5:94–100. doi: 10.1016/S1369-5266(02)00240-6. - DOI - PubMed

Publication types