A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference
- PMID: 30483906
- DOI: 10.1007/s10142-018-0647-3
A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference
Abstract
Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference mapping approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.
Keywords: Coral; Host-microbe symbiosis; K-mer analysis; Meta-transcriptome; RNA-Seq; Soybean.
Similar articles
-
Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis.Bioinformatics. 2017 Feb 1;33(3):327-333. doi: 10.1093/bioinformatics/btw625. Bioinformatics. 2017. PMID: 28172640
-
SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y. BMC Bioinformatics. 2016. PMID: 26847232 Free PMC article.
-
De novo transcriptome assembly of RNA-Seq reads with different strategies.Sci China Life Sci. 2011 Dec;54(12):1129-33. doi: 10.1007/s11427-011-4256-9. Epub 2012 Jan 7. Sci China Life Sci. 2011. PMID: 22227905
-
Next-generation transcriptome assembly.Nat Rev Genet. 2011 Sep 7;12(10):671-82. doi: 10.1038/nrg3068. Nat Rev Genet. 2011. PMID: 21897427 Review.
-
Characterizing and annotating the genome using RNA-seq data.Sci China Life Sci. 2017 Feb;60(2):116-125. doi: 10.1007/s11427-015-0349-4. Epub 2016 Jun 13. Sci China Life Sci. 2017. PMID: 27294835 Review.
References
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous