Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012;11(1):Article 3.

Alignment-free sequence comparison for biologically realistic sequences of moderate length

Affiliations
  • PMID: 22624182
Comparative Study

Alignment-free sequence comparison for biologically realistic sequences of moderate length

Conrad J Burden et al. Stat Appl Genet Mol Biol. 2012.

Abstract

The D(2) statistic, defined as the number of matches of words of some pre-specified length k, is a computationally fast alignment-free measure of biological sequence similarity. However there is some debate about its suitability for this purpose as the variability in D(2) may be dominated by the terms that reflect the noise in each of the single sequences only. We examine the extent of the problem and the effectiveness of overcoming it by using two mean-centred variants of this statistic, D(2)* and D(2c). We conclude that all three statistics are potentially useful measures of sequence similarity, for which reasonably accurate p-values can be estimated under a null hypothesis of sequences composed of identically and independently distributed letters. We show that D(2) and D(2)c, and to a somewhat lesser extent D(2)*, perform well in tests to classify moderate length query sequences as putative cis-regulatory modules.

PubMed Disclaimer

Publication types

MeSH terms