Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan
- PMID: 18384072
- DOI: 10.1002/prot.22015
Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan
Abstract
A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized.
2008 Wiley-Liss, Inc.
Similar articles
-
Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.J Mol Biol. 2000 Mar 17;297(1):233-49. doi: 10.1006/jmbi.2000.3550. J Mol Biol. 2000. PMID: 10704319
-
Estimating quality of template-based protein models by alignment stability.Proteins. 2008 May 15;71(3):1255-74. doi: 10.1002/prot.21819. Proteins. 2008. PMID: 18041762
-
Fast model-based protein homology detection without alignment.Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8. Bioinformatics. 2007. PMID: 17488755
-
Target selection for structural genomics: an overview.Methods Mol Biol. 2008;426:3-25. doi: 10.1007/978-1-60327-058-8_1. Methods Mol Biol. 2008. PMID: 18542854 Review.
-
From protein structure to function.Curr Opin Struct Biol. 1999 Jun;9(3):374-82. doi: 10.1016/S0959-440X(99)80051-7. Curr Opin Struct Biol. 1999. PMID: 10361094 Review.
Cited by
-
SeSAW: balancing sequence and structural information in protein functional mapping.Bioinformatics. 2010 May 1;26(9):1258-9. doi: 10.1093/bioinformatics/btq116. Epub 2010 Mar 17. Bioinformatics. 2010. PMID: 20299324 Free PMC article.
-
Some reflections on a career in science and a note of thanks to the contributors of this Special Issue.Biophys Rev. 2022 Dec 20;14(6):1223-1226. doi: 10.1007/s12551-022-01035-4. eCollection 2022 Dec. Biophys Rev. 2022. PMID: 36659991 Free PMC article. No abstract available.
-
A single polymorphic amino acid on Toxoplasma gondii kinase ROP16 determines the direct and strain-specific activation of Stat3.J Exp Med. 2009 Nov 23;206(12):2747-60. doi: 10.1084/jem.20091703. Epub 2009 Nov 9. J Exp Med. 2009. PMID: 19901082 Free PMC article.
-
Genomes to hits in silico - a country path today, a highway tomorrow: a case study of chikungunya.Curr Pharm Des. 2013;19(26):4687-700. doi: 10.2174/13816128113199990379. Curr Pharm Des. 2013. PMID: 23260020 Free PMC article. Review.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources