Closest string with outliers
- PMID: 21342588
- PMCID: PMC3044313
- DOI: 10.1186/1471-2105-12-S1-S55
Closest string with outliers
Abstract
Background: Given n strings s1, …, sn each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many--but not necessarily all--input strings is an important task that plays a role in many applications in bioinformatics.
Results: Although the closest string model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the closest string with outliers (CSWO) problem, to overcome this limitation. This new model asks for a center string s that is within Hamming distance d to at least n - k of the n input strings, where k is a parameter describing the maximum number of outliers. A CSWO solution not only provides the center string as a representative for the set of strings but also reveals the outliers of the set.We provide fixed parameter algorithms for CSWO when d and k are parameters, for both bounded and unbounded alphabets. We also show that when the alphabet is unbounded the problem is W[1]-hard with respect to n - k, ℓ, and d.
Conclusions: Our refined model abstractly models finding common patterns in several but not all input strings. We initialize the study of the computability of this model and show that it is sensitive to different parameterizations. Lastly, we conclude by suggesting several open problems which warrant further investigation.
Figures
. In bold we have the set of strings S* = {s121, s132, s234} that corresponds to the clique containing the vertices {v1,v2,v3}. We note that S* is the only set such that |S*| = 3, which have Hamming distance at most d from s*.References
-
- Dopazo J, Rodríguez A, Sáiz J, Sobrino F. Design of primers for PCR amplification of highly variable genomes. Computer Applications in the Biosciences. 1993;9:123–125. - PubMed
-
- Lanctot J, Li M, Ma B, Wang S, Zhang L. Distinguishing string selection problems. Information and Computation. 2003. pp. 41–55. - DOI
-
- Lucas K, Busch M, Össinger S. Thompson J: An improved microcomputer program for finding gene-and gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. Computer Applications in the Biosciences. 1991;7:525–529. - PubMed
-
- Proutski V, Holme E. Primer master: A new program for the design and analyiss of PCR primers. Computer Applications in the Biosciences. 1996;12:253–255. - PubMed
-
- Deng X, Li G, Li Z, Ma B, Wang L. Genetic design of drugs without side-effects. SIAM Journal on Computing. 2003;32(4):1073–1090. doi: 10.1137/S0097539701397825. - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
