Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(3):e34225.
doi: 10.1371/journal.pone.0034225. Epub 2012 Mar 29.

Exploring the switchgrass transcriptome using second-generation sequencing technology

Affiliations

Exploring the switchgrass transcriptome using second-generation sequencing technology

Yixing Wang et al. PLoS One. 2012.

Abstract

Background: Switchgrass (Panicum virgatum L.) is a C4 perennial grass and widely popular as an important bioenergy crop. To accelerate the pace of developing high yielding switchgrass cultivars adapted to diverse environmental niches, the generation of genomic resources for this plant is necessary. The large genome size and polyploid nature of switchgrass makes whole genome sequencing a daunting task even with current technologies. Exploring the transcriptional landscape using next generation sequencing technologies provides a viable alternative to whole genome sequencing in switchgrass.

Principal findings: Switchgrass cDNA libraries from germinating seedlings, emerging tillers, flowers, and dormant seeds were sequenced using Roche 454 GS-FLX Titanium technology, generating 980,000 reads with an average read length of 367 bp. De novo assembly generated 243,600 contigs with an average length of 535 bp. Using the foxtail millet genome as a reference greatly improved the assembly and annotation of switchgrass ESTs. Comparative analysis of the 454-derived switchgrass EST reads with other sequenced monocots including Brachypodium, sorghum, rice and maize indicated a 70-80% overlap. RPKM analysis demonstrated unique transcriptional signatures of the four tissues analyzed in this study. More than 24,000 ESTs were identified in the dormant seed library. In silico analysis indicated that there are more than 2000 EST-SSRs in this collection. Expression of several orphan ESTs was confirmed by RT-PCR.

Significance: We estimate that about 90% of the switchgrass gene space has been covered in this analysis. This study nearly doubles the amount of EST information for switchgrass currently in the public domain. The celerity and economical nature of second-generation sequencing technologies provide an in-depth view of the gene space of complex genomes like switchgrass. Sequence analysis of closely related members of the NAD(+)-malic enzyme type C4 grasses such as the model system Setaria viridis can serve as a viable proxy for the switchgrass genome.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: Doug Bryant is affiliated to Intuitive Genomics. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Frequency distribution of 454 sequencing read lengths.
Histogram of Roche 454 GS-FLX Titanium read lengths after filtering and trimming adapters.
Figure 2
Figure 2. Unigene assembly features of switchgrass transcriptome.
(A) Histogram of contig lengths following the 2-step de novo assembly process. The x-axis has been truncated at 3 kb. The longest contig is 12,437 base pairs. (B) Histogram of the average read-depth coverage for assembled contigs. Coverage values greater than 30× have been binned together.
Figure 3
Figure 3. Switchgrass 454-based sequencing reads mapped to the foxtail millet genome.
The number of contigs and unassembled ESTs that produced significant alignments to the foxtail millet genome are plotted for each 0.5 megabase interval. Radial axis line represents one log interval. Numbers on the circumference represent the Foxtail millet chromosomes (1–9) and each chromosome is given a different color. Assembled contigs aligned to the forward strand of the Setaria italica reference genome assembly are shown in blue for forward strand alignments and in yellow for reverse strand alignments. Singleton reads aligned to the forward strand are shown on the same figure in red, while singleton reads aligned to the reverse strand are shown in green. Diagram was prepared using Circos (http://mkweb.bcgsc.ca/circos).
Figure 4
Figure 4. Plant GO-slim terms associated with switchgrass.
Venn diagram of the distribution of plant GO-slim terms associated with switchgrass contigs represented in molecular function, biological process and cellular component categories.
Figure 5
Figure 5. Distribution of simple sequence repeats in switchgrass ESTs.
Di-, tri-, tetra-, penta- and hexa-nucleotide repeats were analyzed and their frequency plotted as a function of the repeat number.
Figure 6
Figure 6. Heat map of switchgrass gene expression in four different tissues.
Individual columns represent the four tissues used in this study while each row represents a unique contig. The dendrogram represents the similarity of expression profiles between the tissue samples based on average linkage clustering. The color key represents the log2 transformed RPKM values. Red indicates a low level of expression, green and blue are intermediate levels, while purple indicates maximal levels of gene expression. The highest value for RPKM after log 2 transformation was 17 and the lowest value was 0.1.
Figure 7
Figure 7. Switchgrass gene expression analysis by RT-PCR.
The four panels represents the amplifications from the cDNA derived from the four different tissues – flowers, tillers, seeds and seedlings. The lanes labeled 1–22 are amplifications of EST contigs with functional annotations. Lanes 23–42 contains amplifications of EST contigs that did not show any homologies to sequences in the databases. Lane 43 is the amplification of the teosinte branched 1 gene that shows tiller specific expression. M indicates the 100 bp DNA size ladder.
Figure 8
Figure 8. Overview of the C4 NAD+-Malic enzyme photosynthesis.
The numbers shown in red font in the figure correspond to the ESTs that were identified in the 454 sequencing analysis. Ala-Alanine; AlaAT – Alanine aminotransferase; Alpha-KG-alpha ketoglutarate; Asp-aspartate; AspAT-aspartate aminotransferase; CA-carbonic anhydrase; Glu-glutamate; OA-oxaloacetate; PEP-phosphoenol pyruvate; PEPC-phosphoenolpyruvate carboxylase; PPDK –pyruvate phosphate dikinase; Pyr-pyruvate; NAD+-MDH- NAD+ dependent malate dehydrogenase; NAD+-ME- NAD+ dependent malic enzyme.

References

    1. Andersen JR, Lubberstedt T. Functional markers in plants. Trends in Plant Science. 2003;8:554–560. - PubMed
    1. Emrich SJ, Barbazuk WB, Li L, Schnable PS. Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Research. 2007;17:69–73. - PMC - PubMed
    1. Kaur S, Cogan NO, Pembleton LW, Shinozuka M, Savin KW, et al. Transcriptome sequencing of lentil based on second-generation technology permits large-scale unigene assembly and SSR marker discovery. BMC Genomics. 2011;12:265. - PMC - PubMed
    1. Barbazuk WB, Emrich S, Schnable PS. SNP Mining from Maize 454 EST Sequences. CSH Protoc. 2007;2007:pdb prot4786. - PubMed
    1. Mahalingam R, Gomez-Buitrago A, Eckardt N, Shah N, Guevara-Garcia A, et al. Characterizing the stress/defense transcriptome of Arabidopsis. Genome Biology. 2003;4:R20. - PMC - PubMed

Publication types