Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014;2014(1):8.
doi: 10.1186/1687-4153-2014-8. Epub 2014 May 28.

Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity

Affiliations

Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity

Brian R King et al. EURASIP J Bioinform Syst Biol. 2014.

Abstract

Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.

Keywords: Discrete Fourier transform; Sequence analysis; Sequence similarity.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histogram of observed sequence identity over all pairs of aligned sequences in INS19 dataset. The percent identity is computed for all possible pairs of sequences in the INS19 dataset. Most data averaged between 55% and 75% sequence identity.
Figure 2
Figure 2
ICD-based dendrogram for INS19. This figure shows the resulting dendrogram generated based on the ICD method applied on the ICD19 dataset, which contains mRNA sequences taken from 19 different eukaryotic species for the insulin (INS) gene.
Figure 3
Figure 3
Alignment-based dendrogram for INS19. This figure shows the resulting dendrogram generated from phylogenetic relationships inferred from pairwise alignments computed over all pairs from the INS19 dataset, which contains mRNA sequences taken from 19 different eukaryotic species for the insulin (INS) gene.
Figure 4
Figure 4
Alignment-free-based dendrogram using FFP [2] method for INS19. This figure shows the resulting dendrogram generated from phylogenetic relationships inferred using the FFP method on the INS19 dataset, which contains mRNA sequences taken from 19 different eukaryotic species for the insulin (INS) gene.
Figure 5
Figure 5
ICD-based dendrogram for FLU60. This figure shows the resulting dendrogram generated based on the ICD method applied on the FLU60 dataset, which contains 60 sequences of the HA gene of different subtypes of avian influenza type A.

Similar articles

Cited by

References

    1. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, UK; 1998. p. 356.
    1. Sims GE, Jun S-R, Wu GA, Kim S-H. Whole-genome phylogeny of mammals: evolutionary information in genic and nongenic regions. Proc Natl Acad Sci U S A. 2009;106:17077–82. doi: 10.1073/pnas.0909377106. - DOI - PMC - PubMed
    1. Phillips A, Janies D, Wheeler W. Multiple sequence alignment in phylogenetic analysis. Mol Phylogenet Evol. 2000;16:317–30. doi: 10.1006/mpev.2000.0785. - DOI - PubMed
    1. Samuelsson T. Genomics and bioinformatics: an introduction to programming tools for life scientists. 1. Cambridge University Press, Cambridge, UK; 2012. p. 356.
    1. Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. - DOI - PubMed

LinkOut - more resources