Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Aug;5(8):1520-32.
doi: 10.1074/mcp.T600022-MCP200. Epub 2006 Jun 11.

DivergentSet, a tool for picking non-redundant sequences from large sequence collections

Affiliations
Free article

DivergentSet, a tool for picking non-redundant sequences from large sequence collections

Jeremy Widmann et al. Mol Cell Proteomics. 2006 Aug.
Free article

Abstract

DivergentSet addresses the important but so far neglected bioinformatics task of choosing a representative set of sequences from a larger collection. We found that using a phylogenetic tree to guide the construction of divergent sets of sequences can be up to 2 orders of magnitude faster than the naive method of using a full distance matrix. By providing a user-friendly interface (available online) that integrates the tasks of finding additional sequences, building and refining the divergent set, producing random divergent sets from the same sequences, and exporting identifiers, this software facilitates a wide range of bioinformatics analyses including finding significant motifs and covariations. As an example application of DivergentSet, we demonstrate that the motifs identified by the motif-finding package MEME (Motif Elicitation by Maximum Entropy) are highly unstable with respect to the specific choice of sequences. This instability suggests that the types of sensitivity analysis enabled by DivergentSet may be widely useful for identifying the motifs of biological significance.

PubMed Disclaimer

Publication types

LinkOut - more resources