New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing
- PMID: 24064230
- PMCID: PMC4017329
- DOI: 10.1093/bib/bbt067
New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing
Abstract
With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data.
Keywords: Markov model; NGS data; alignment-free; genome comparison; statistical power; word patterns.
Similar articles
-
Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics.Bioinformatics. 2016 Apr 1;32(7):993-1000. doi: 10.1093/bioinformatics/btv395. Epub 2015 Jun 30. Bioinformatics. 2016. PMID: 26130573 Free PMC article.
-
Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns.BMC Bioinformatics. 2014;15 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2105-15-S9-S1. Epub 2014 Sep 10. BMC Bioinformatics. 2014. PMID: 25252700 Free PMC article.
-
Optimal choice of word length when comparing two Markov sequences using a χ 2-statistic.BMC Genomics. 2017 Oct 3;18(Suppl 6):732. doi: 10.1186/s12864-017-4020-z. BMC Genomics. 2017. PMID: 28984181 Free PMC article.
-
Pattern recognition and probabilistic measures in alignment-free sequence analysis.Brief Bioinform. 2014 May;15(3):354-68. doi: 10.1093/bib/bbt070. Epub 2013 Oct 3. Brief Bioinform. 2014. PMID: 24096012 Review.
-
Large disclosing the nature of computational tools for the analysis of next generation sequencing data.Curr Top Med Chem. 2012;12(12):1320-30. doi: 10.2174/156802612801319007. Curr Top Med Chem. 2012. PMID: 22690679 Review.
Cited by
-
Graph Theory-Based Sequence Descriptors as Remote Homology Predictors.Biomolecules. 2019 Dec 23;10(1):26. doi: 10.3390/biom10010026. Biomolecules. 2019. PMID: 31878100 Free PMC article.
-
Optimization of Genotype by Sequencing data for phylogenetic purposes.MethodsX. 2020 Apr 20;7:100892. doi: 10.1016/j.mex.2020.100892. eCollection 2020. MethodsX. 2020. PMID: 32373482 Free PMC article.
-
The Amordad database engine for metagenomics.Bioinformatics. 2014 Oct 15;30(20):2949-55. doi: 10.1093/bioinformatics/btu405. Epub 2014 Jun 27. Bioinformatics. 2014. PMID: 24974201 Free PMC article.
-
Inferring phylogenies of evolving sequences without multiple sequence alignment.Sci Rep. 2014 Sep 30;4:6504. doi: 10.1038/srep06504. Sci Rep. 2014. PMID: 25266120 Free PMC article.
-
kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity.PLoS Comput Biol. 2017 Sep 5;13(9):e1005727. doi: 10.1371/journal.pcbi.1005727. eCollection 2017 Sep. PLoS Comput Biol. 2017. PMID: 28873405 Free PMC article.
References
-
- Smith TF, Waterman MS. Comparison of biosequences. Adv Appl Math. 1981;2:482–9.
-
- Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. - PubMed
-
- Blaisdell BE. Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol. 1985;21:278–88. - PubMed
-
- Vinga S, Almeida J. Alignment-free sequence comparison - a review. Bioinformatics. 2003;19:513–23. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources