Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains
- PMID: 27876823
- PMCID: PMC5120338
- DOI: 10.1038/srep37243
Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains
Abstract
The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.
Figures











Similar articles
-
Comparison of metatranscriptomic samples based on k-tuple frequencies.PLoS One. 2014 Jan 2;9(1):e84348. doi: 10.1371/journal.pone.0084348. eCollection 2014. PLoS One. 2014. PMID: 24392128 Free PMC article.
-
A New Context Tree Inference Algorithm for Variable Length Markov Chain Model with Applications to Biological Sequence Analyses.J Comput Biol. 2022 Aug;29(8):839-856. doi: 10.1089/cmb.2021.0604. Epub 2022 Apr 22. J Comput Biol. 2022. PMID: 35451885 Free PMC article.
-
Gene finding in metatranscriptomic sequences.BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S8. doi: 10.1186/1471-2105-15-S9-S8. Epub 2014 Sep 10. BMC Bioinformatics. 2014. PMID: 25253067 Free PMC article.
-
Algorithms for variable length Markov chain modeling.Bioinformatics. 2004 Mar 22;20(5):788-9. doi: 10.1093/bioinformatics/btg489. Epub 2004 Jan 29. Bioinformatics. 2004. PMID: 14751999
-
Optimal choice of word length when comparing two Markov sequences using a χ 2-statistic.BMC Genomics. 2017 Oct 3;18(Suppl 6):732. doi: 10.1186/s12864-017-4020-z. BMC Genomics. 2017. PMID: 28984181 Free PMC article.
Cited by
-
SCRAPT: an iterative algorithm for clustering large 16S rRNA gene data sets.Nucleic Acids Res. 2023 May 8;51(8):e46. doi: 10.1093/nar/gkad158. Nucleic Acids Res. 2023. PMID: 36912074 Free PMC article.
-
Tomato RNA-seq Data Mining Reveals the Taxonomic and Functional Diversity of Root-Associated Microbiota.Microorganisms. 2019 Dec 24;8(1):38. doi: 10.3390/microorganisms8010038. Microorganisms. 2019. PMID: 31878183 Free PMC article.
-
Alignment-free sequence comparison: benefits, applications, and tools.Genome Biol. 2017 Oct 3;18(1):186. doi: 10.1186/s13059-017-1319-7. Genome Biol. 2017. PMID: 28974235 Free PMC article. Review.
-
Fast parallel construction of variable-length Markov chains.BMC Bioinformatics. 2021 Oct 9;22(1):487. doi: 10.1186/s12859-021-04387-y. BMC Bioinformatics. 2021. PMID: 34627154 Free PMC article.
-
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.Microbiome. 2017 Jul 6;5(1):69. doi: 10.1186/s40168-017-0283-5. Microbiome. 2017. PMID: 28683828 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases