Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 30;20(1):95.
doi: 10.1186/s12864-019-5452-4.

Usability of reference-free transcriptome assemblies for detection of differential expression: a case study on Aethionema arabicum dimorphic seeds

Affiliations

Usability of reference-free transcriptome assemblies for detection of differential expression: a case study on Aethionema arabicum dimorphic seeds

Per K I Wilhelmsson et al. BMC Genomics. .

Abstract

Background: RNA-sequencing analysis is increasingly utilized to study gene expression in non-model organisms without sequenced genomes. Aethionema arabicum (Brassicaceae) exhibits seed dimorphism as a bet-hedging strategy - producing both a less dormant mucilaginous (M+) seed morph and a more dormant non-mucilaginous (NM) seed morph. Here, we compared de novo and reference-genome based transcriptome assemblies to investigate Ae. arabicum seed dimorphism and to evaluate the reference-free versus -dependent approach for identifying differentially expressed genes (DEGs).

Results: A de novo transcriptome assembly was generated using sequences from M+ and NM Ae. arabicum dry seed morphs. The transcripts of the de novo assembly contained 63.1% complete Benchmarking Universal Single-Copy Orthologs (BUSCO) compared to 90.9% for the transcripts of the reference genome. DEG detection used the strict consensus of three methods (DESeq2, edgeR and NOISeq). Only 37% of 1533 differentially expressed de novo assembled transcripts paired with 1876 genome-derived DEGs. Gene Ontology (GO) terms distinguished the seed morphs: the terms translation and nucleosome assembly were overrepresented in DEGs higher in abundance in M+ dry seeds, whereas terms related to mRNA processing and transcription were overrepresented in DEGs higher in abundance in NM dry seeds. DEGs amongst these GO terms included ribosomal proteins and histones (higher in M+), RNA polymerase II subunits and related transcription and elongation factors (higher in NM). Expression of the inferred DEGs and other genes associated with seed maturation (e.g. those encoding late embryogenesis abundant proteins and transcription factors regulating seed development and maturation such as ABI3, FUS3, LEC1 and WRI1 homologs) were put in context with Arabidopsis thaliana seed maturation and indicated that M+ seeds may desiccate and mature faster than NM. The 1901 transcriptomic DEG set GO-terms had almost 90% overlap with the 2191 genome-derived DEG GO-terms.

Conclusions: Whilst there was only modest overlap of DEGs identified in reference-free versus -dependent approaches, the resulting GO analysis was concordant in both approaches. The identified differences in dry seed transcriptomes suggest mechanisms underpinning previously identified contrasts between morphology and germination behaviour of M+ and NM seeds.

Keywords: Aethionema arabicum; Dimorphic seeds; RNA-seq; Reference and reference-free; Transcriptome.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The source of the Ae. arabicum seeds were accessions 0000309 (obtained from Kew’s Millennium Seed Bank) and ES1020 (obtained from Eric Schranz, Wageningen) [3]. This study complies with institutional, national, and international guidelines.

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Fruit and seed dimorphism in Ae. arabicum. Mature infructescence (a) of Ae. arabicum, showing distinct dehiscent (DEH) and indehiscent (IND) fruit morphs (marked by red arrows). Large DEH fruits (b) contain up to six mucilaginous (M+) seeds, while small IND fruits (c) contain a single, non-mucilaginous (NM) seed. Both seed morphs differ in mean seed mass and moisture content. Values shown are means ± SEM for n = 8 each of 100 seeds (mass), and n = 4 each of 30 seeds (moisture content) replicate measurements. Scale bars = 4 mm (a), 1 mm (b and c). FW, fresh weight
Fig. 2
Fig. 2
RNA-Seq analysis pipeline. Raw RNA-seq reads were checked for quality control (FastQC) and processed to remove adapters and low-quality bases (Trimmomatic, PrinSeq). Cleaned reads were either: mapped to the genome (GSNAP); or were used for de novo transcriptome assembly (Trinity) and mapped to the resulting transcriptome (GSNAP). Transcriptome-mapped and genome-mapped reads were compared at each stage of analysis: After mapping; after differentially expressed gene (DEG) identification (EdgeR, DESeq2, NoiSEQ), and after gene-ontology (GO) analysis (Blast2GO)
Fig. 3
Fig. 3
From raw to filtered reads. Trimming of raw reads with Trimmomatic removed adapters and low-quality reads. Trimmed reads were further processed with poly-A / poly-T removal with PrinSeq. The resulting reads were then filtered to remove chloroplastic, mitochondrial and ribosomal RNA reads. The total number of reads left after each step is indicated for samples M+ (1), M+ (2), NM (1) and NM (2)
Fig. 4
Fig. 4
BUSCO completeness analysis. cDNA from the Ae. arabicum de novo assembly, Ae. arabicum genome v2.5 and A. thaliana TAIR10 were compared to 1440 Embryophyta reference orthologs for completeness assessment
Fig. 5
Fig. 5
Transcript length distribution and mapping efficiency. Length distribution of the de novo assembled transcripts and Ae. arabicum mRNAs derived from the genome assembly (a). Processed reads and amount of mapping reads to the Ae. arabicum whole genome V2.5, gene models from V2.5, and de novo assembly transcripts (b)
Fig. 6
Fig. 6
Consensus of DEG calling and PCA of overlap of common DEGs. Venn diagram of the DEGs called between NM and M+ seeds by the three DEG detection programs (edgeR, NOIseq and DESeq2) using the transcriptome (a) and genome (b) approach. Principal Component Analysis of RPKM (Reads Per Kilobase per Million reads) of the 561 DEGs common to the transcriptome, ‘T’ and genome, ‘G’ (c). Samples M+ (circle) and NM (triangles), in black, show the results for the dehiscent and indehiscent seeds in the transcriptome approach. Samples M+ (circle) and NM (triangles), in white, show the corresponding results in the genome approach. The percentage variance explained by each principal component is indicated on the axes
Fig. 7
Fig. 7
GO term word clouds of genome and transcriptome DEGs. Word clouds showing significantly over-represented (green) and under-represented (red) Biological Process terms for the genome DEGs (a) and the transcriptome DEGs (b). Word height is proportional to -log10(q-value), significantly over-represented GO-terms are coloured green (q < = 0.0001 dark green, q > 0.0001 light green) and under-represented GO-terms are coloured red (q < = 0.0001 dark red, q > 0.0001 light red)
Fig. 8
Fig. 8
Key processes and differentially expressed genes (DEGs) differ between Ae. arabicum M+ and NM seeds. a Timing of key processes during development and maturation of A. thaliana seeds. Dormancy and desiccation tolerance coincides with changes in water, abscisic acid (ABA) and triacylglycerol (TAG) contents, seed weight, nuclear size and chromatin condensation, endosperm proportion and germinability; Data from [32, 41, 55]. b Selected Ae. arabicum DEG putative ortholog expression during A. thaliana seed development and maturation. Cumulative transcript abundances for A. thaliana putative orthologs of Ae. arabicum 21 histone and 119 ribosomal protein genes (Additional file 3: Figure S4); individual abundances for RNA polymerase II large subunit (AtNRPB1), oleosin AtOLE2 (seed storage), heat shock factor AtHSFA9 (longevity), and AtNYE1 (chlorophyll degradation); data from Arabidopsis eFP browser [74] and [–31]. c Expression of late embryogenesis abundant (LEA) proteins, seed maturation master regulators (AtLEC1, AtLEC2, AtABI3, AtFUS3) and WRINKLED1 (AtWRI1), a transcription factor associated with enhanced fatty acid and TAG biosynthesis during A. thaliana seed maturation; data from Arabidopsis eFP browser and [–31, 58]. d Expression of selected Ae. arabicum DEGs for ribosomal proteins, histones, NRPB1 (RNAseq) and histone acetyltransferase HAC1 (qRT-PCR) in M+ and NM seeds. Cumulative RPKM values presented for 21 histone and 119 ribosomal protein genes of Ae. arabicum (Additional file 3: Figure S4). A * indicates a significant difference between M+ and NM seeds based on using a t-test (p < 0.05); n.s. means ‘not significant’. e Expression of RNA polymerase II complex and associated factors [50, 51] that mediate transcription including initiation, elongation and processing of transcripts in Ae. arabicum dry seed morphs. Red text indicates factor identified as NM-high DEG with expression ratio (NM / M+) indicated. Note core NRPB1/2 transcript abundance and most factors are several-fold higher in NM seeds. f Seed maturation master regulators expression (RNAseq, ABI3 also by qRT-PCR), oleosins, NYE1 and HSFA9 in dry M+ and NM Ae. arabicum seeds. g Selected Ae. arabicum LEA expression in dry M+ and NM seeds (RNAseq and qRT-PCR). The presented dehydrin is the putative ortholog of At4G39130. Error bars indicate mean ± SEM for qRT-PCR experiments. For the plotted RPKM values of single genes from the RNAseq data we used the result of the DEG detection pipeline (edgeR + NOISeq + DESeq2) as the indicator of significance

Similar articles

Cited by

References

    1. Brautigam A, Gowik U. What can next generation sequencing do for you? Next generation sequencing as a valuable tool in plant research. Plant Biol (Stuttg) 2010;12(6):831–841. doi: 10.1111/j.1438-8677.2010.00373.x. - DOI - PubMed
    1. Mohammadin S, Peterse K, van de Kerke SJ, Chatrou LW, Donmez AA, Mummenhoff K, Pires JC, Edger PP, Al-Shehbaz IA, Schranz ME. Anatolian origins and diversification of Aethionema, the sister lineage of the core Brassicaceae. Am J Bot. 2017;104(7):1042–1054. doi: 10.3732/ajb.1700091. - DOI - PubMed
    1. Lenser T, Graeber K, Cevik OS, Adiguzel N, Donmez AA, Grosche C, Kettermann M, Mayland-Quellhorst S, Merai Z, Mohammadin S, et al. Developmental control and plasticity of fruit and seed dimorphism in Aethionema arabicum. Plant Physiol. 2016;172(3):1691–1707. doi: 10.1104/pp.16.00838. - DOI - PMC - PubMed
    1. Arshad W, Sperber K, Steinbrecher T, Nichols B, Jansen VAA, Leubner-Metzger G, Mummenhoff K. Dispersal biophysics and adaptive significance of dimorphic diaspores in the annual Aethionema arabicum (Brassicaceae). New Phytol. 2019;221(3):1434–46. 10.1111/nph.15490. Epub 2018 Oct 25. - PMC - PubMed
    1. Haudry A, Platts AE, Vello E, Hoen DR, Leclercq M, Williamson RJ, Forczek E, Joly-Lopez Z, Steffen JG, Hazzouri KM, et al. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat Genet. 2013;45(8):891–U228. doi: 10.1038/ng.2684. - DOI - PubMed

LinkOut - more resources