On dimension reduction of clustering results in structural bioinformatics
- PMID: 25196235
- DOI: 10.1016/j.bbapap.2014.08.015
On dimension reduction of clustering results in structural bioinformatics
Abstract
OPTICS is a density-based clustering algorithm that performs well in a wide variety of applications. For a set of input objects, the algorithm creates a reachability plot that can either be used to produce cluster membership assignments, or interpreted itself as an expressive two-dimensional representation of the clustering structure of the input set, even if the input set is embedded in higher dimensions. The focus of this work is a visualization method that can be applied for comparing two, independent hierarchical clusterings by assigning colors to all entries of the input database. We give two applications related to macromolecular structural properties: the first is a sequence-based clustering of the SwissProt database that is evaluated using NCBI taxonomy identifiers, and the second application involves clustering locations of specific atoms in the serine protease enzyme family-and the clusters are evaluated using SCOP structural classifications.
Keywords: Clustering; OPTICS; Phylogenetics; Phylogenomics; Protein sequences; SCOP classification; SCOP tree; Sequence alignment; SwissProt; UniProt.
Copyright © 2014 Elsevier B.V. All rights reserved.
Similar articles
-
Structural SCOP superfamily level classification using unsupervised machine learning.IEEE/ACM Trans Comput Biol Bioinform. 2012;9(2):601-8. doi: 10.1109/TCBB.2011.114. Epub 2011 Aug 4. IEEE/ACM Trans Comput Biol Bioinform. 2012. PMID: 21844638
-
AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22. Bioinformatics. 2007. PMID: 17379694
-
Global self-organization of all known protein sequences reveals inherent biological signatures.J Mol Biol. 1997 May 2;268(2):539-56. doi: 10.1006/jmbi.1997.0948. J Mol Biol. 1997. PMID: 9159489
-
On the quality of tree-based protein classification.Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12. Bioinformatics. 2005. PMID: 15647305
-
SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale.BMC Bioinformatics. 2010 Mar 9;11:120. doi: 10.1186/1471-2105-11-120. BMC Bioinformatics. 2010. PMID: 20214776 Free PMC article.
Cited by
-
PDB_Amyloid: an extended live amyloid structure list from the PDB.FEBS Open Bio. 2018 Nov 22;9(1):185-190. doi: 10.1002/2211-5463.12524. eCollection 2019 Jan. FEBS Open Bio. 2018. PMID: 30652085 Free PMC article.
LinkOut - more resources
Full Text Sources
Other Literature Sources