CD-HIT Suite: a web server for clustering and comparing biological sequences

Ying Huang¹, Beifang Niu, Ying Gao, Limin Fu, Weizhong Li

Affiliations

PMID: 20053844
PMCID: PMC2828112
DOI: 10.1093/bioinformatics/btq003

Comparative Study

CD-HIT Suite: a web server for clustering and comparing biological sequences

Ying Huang et al. Bioinformatics. 2010.

. 2010 Mar 1;26(5):680-2.

doi: 10.1093/bioinformatics/btq003. Epub 2010 Jan 6.

Authors

Ying Huang¹, Beifang Niu, Ying Gao, Limin Fu, Weizhong Li

Affiliation

¹ California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, CA, USA.

PMID: 20053844
PMCID: PMC2828112
DOI: 10.1093/bioinformatics/btq003

Abstract

CD-HIT is a widely used program for clustering and comparing large biological sequence datasets. In order to further assist the CD-HIT users, we significantly improved this program with more functions and better accuracy, scalability and flexibility. Most importantly, we developed a new web server, CD-HIT Suite, for clustering a user-uploaded sequence dataset or comparing it to another dataset at different identity levels. Users can now interactively explore the clusters within web browsers. We also provide downloadable clusters for several public databases (NCBI NR, Swissprot and PDB) at different identity levels.

Availability: Free access at http://cd-hit.org

PubMed Disclaimer

Figures

**Fig. 1.**
Screenshots of CD-HIT Suite. (a) Cluster Explorer for investigating clusters. (b) A cluster distribution plot to explore the global structure of a whole dataset.

See this image and copyright information in PMC

References

1. Letunic I, et al. SMART 6: recent updates and new developments. Nucleic Acids Res. 2009;37:D229–D232. - PMC - PubMed
1. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. - PubMed
1. Li W, et al. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics. 2001;17:282–283. - PubMed
1. Li W, et al. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics. 2002;18:77–82. - PubMed
1. Li W, et al. Probing metagenomics by rapid cluster analysis of very large datasets. PLoS ONE. 2008;3:e3375. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CD-HIT Suite: a web server for clustering and comparing biological sequences

Affiliation

CD-HIT Suite: a web server for clustering and comparing biological sequences

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources