Comparative Study

. 2002 Jul;18(7):908-21.

doi: 10.1093/bioinformatics/18.7.908.

Clustering of proximal sequence space for the identification of protein families

Federico Abascal¹, Alfonso Valencia

Affiliations

PMID: 12117788
DOI: 10.1093/bioinformatics/18.7.908

Comparative Study

Clustering of proximal sequence space for the identification of protein families

Federico Abascal et al. Bioinformatics. 2002 Jul.

. 2002 Jul;18(7):908-21.

doi: 10.1093/bioinformatics/18.7.908.

Authors

Federico Abascal¹, Alfonso Valencia

Affiliation

¹ Protein Design Group, National Centre for Biotechnology, CNB-CSIC, Cantoblanco, Madrid E-28049, Spain.

PMID: 12117788
DOI: 10.1093/bioinformatics/18.7.908

Abstract

Motivation: The study of sequence space, and the deciphering of the structure of protein families and subfamilies, has up to now been required for work in comparative genomics and for the prediction of protein function. With the emergence of structural proteomics projects, it is becoming increasingly important to be able to select protein targets for structural studies that will appropriately cover the space of protein sequences, functions and genomic distribution. These problems are the motivation for the development of methods for clustering protein sequences and building families of potentially orthologous sequences, such as those proposed here.

Results: First we developed a clustering strategy (Ncut algorithm) capable of forming groups of related sequences by assessing their pairwise relationships. The results presented for the ras super-family of proteins are similar to those produced by other clustering methods, but without the need for clustering the full sequence space. The Ncut clusters are then used as the input to a process of reconstruction of groups with equilibrated genomic composition formed by closely-related sequences. The results of applying this technique to the data set used in the construction of the COG database are very similar to those derived by the human experts responsible for this database.

Availability: The analysis of different systems, including the COG equivalent 21 genomes are available at http://www.pdg.cnb.uam.es/GenoClustering.html.

PubMed Disclaimer

Cited by

Functional classification using phylogenomic inference.
Brown D, Sjölander K. Brown D, et al. PLoS Comput Biol. 2006 Jun 30;2(6):e77. doi: 10.1371/journal.pcbi.0020077. PLoS Comput Biol. 2006. PMID: 16846248 Free PMC article. Review. No abstract available.
A Bayesian sampler for optimization of protein domain hierarchies.
Neuwald AF. Neuwald AF. J Comput Biol. 2014 Mar;21(3):269-86. doi: 10.1089/cmb.2013.0099. Epub 2014 Feb 4. J Comput Biol. 2014. PMID: 24494927 Free PMC article.
clusterMaker: a multi-algorithm clustering plugin for Cytoscape.
Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Su G, Bader GD, Ferrin TE. Morris JH, et al. BMC Bioinformatics. 2011 Nov 9;12:436. doi: 10.1186/1471-2105-12-436. BMC Bioinformatics. 2011. PMID: 22070249 Free PMC article.
Automated protein subfamily identification and classification.
Brown DP, Krishnamurthy N, Sjölander K. Brown DP, et al. PLoS Comput Biol. 2007 Aug;3(8):e160. doi: 10.1371/journal.pcbi.0030160. PLoS Comput Biol. 2007. PMID: 17708678 Free PMC article.
OrthoMCL: identification of ortholog groups for eukaryotic genomes.
Li L, Stoeckert CJ Jr, Roos DS. Li L, et al. Genome Res. 2003 Sep;13(9):2178-89. doi: 10.1101/gr.1224503. Genome Res. 2003. PMID: 12952885 Free PMC article.

See all "Cited by" articles

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Silverchair Information Systems
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Clustering of proximal sequence space for the identification of protein families

Affiliation

Clustering of proximal sequence space for the identification of protein families

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources