Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity
- PMID: 24991213
- PMCID: PMC4077688
- DOI: 10.1186/1687-4153-2014-8
Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity
Abstract
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.
Keywords: Discrete Fourier transform; Sequence analysis; Sequence similarity.
Figures





Similar articles
-
An improved model for whole genome phylogenetic analysis by Fourier transform.J Theor Biol. 2015 Oct 7;382:99-110. doi: 10.1016/j.jtbi.2015.06.033. Epub 2015 Jul 4. J Theor Biol. 2015. PMID: 26151589
-
A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.J Theor Biol. 2014 Oct 21;359:18-28. doi: 10.1016/j.jtbi.2014.05.043. Epub 2014 Jun 6. J Theor Biol. 2014. PMID: 24911780
-
ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.BMC Genomics. 2019 Apr 3;20(1):267. doi: 10.1186/s12864-019-5571-y. BMC Genomics. 2019. PMID: 30943897 Free PMC article.
-
Phylogenetic inferences from molecular sequences: review and critique.Theor Popul Biol. 2001 Feb;59(1):27-40. doi: 10.1006/tpbi.2000.1485. Theor Popul Biol. 2001. PMID: 11243926 Review.
-
Gene prediction based on DNA spectral analysis: a literature review.J Comput Biol. 2011 Apr;18(4):639-76. doi: 10.1089/cmb.2010.0184. Epub 2011 Mar 7. J Comput Biol. 2011. PMID: 21381961 Review.
Cited by
-
Mathematical Approach to Protein Sequence Comparison Based on Physiochemical Properties.ACS Omega. 2022 Oct 17;7(43):39446-39455. doi: 10.1021/acsomega.2c06103. eCollection 2022 Nov 1. ACS Omega. 2022. PMID: 36340165 Free PMC article.
-
A new gene tree algorithm employing DNA sequences of bovine genome using discrete Fourier transformation.PLoS One. 2023 Mar 9;18(3):e0277480. doi: 10.1371/journal.pone.0277480. eCollection 2023. PLoS One. 2023. PMID: 36893167 Free PMC article.
-
Uncovering Signals from the Coronavirus Genome.Genes (Basel). 2021 Jun 25;12(7):973. doi: 10.3390/genes12070973. Genes (Basel). 2021. PMID: 34202172 Free PMC article.
References
-
- Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, UK; 1998. p. 356.
-
- Samuelsson T. Genomics and bioinformatics: an introduction to programming tools for life scientists. 1. Cambridge University Press, Cambridge, UK; 2012. p. 356.
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous