Database of homology-derived protein structures and the structural meaning of sequence alignment
- PMID: 2017436
- DOI: 10.1002/prot.340090107
Database of homology-derived protein structures and the structural meaning of sequence alignment
Abstract
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.
Similar articles
-
The HSSP database of protein structure-sequence alignments.Nucleic Acids Res. 1994 Sep;22(17):3597-9. Nucleic Acids Res. 1994. PMID: 7937066 Free PMC article.
-
Prediction of protein structure by evaluation of sequence-structure fitness. Aligning sequences to contact profiles derived from three-dimensional structures.J Mol Biol. 1993 Aug 5;232(3):805-25. doi: 10.1006/jmbi.1993.1433. J Mol Biol. 1993. PMID: 8355272
-
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975. J Mol Biol. 2000. PMID: 10966778
-
Secondary structure prediction and protein design.Biochem Soc Symp. 1990;57:11-24. Biochem Soc Symp. 1990. PMID: 2099736 Review.
-
Searching protein structure databases has come of age.Proteins. 1994 Jul;19(3):165-73. doi: 10.1002/prot.340190302. Proteins. 1994. PMID: 7937731 Review.
Cited by
-
An overview of online resources for intra-species detection of gene duplications.Front Genet. 2022 Oct 13;13:1012788. doi: 10.3389/fgene.2022.1012788. eCollection 2022. Front Genet. 2022. PMID: 36313461 Free PMC article. Review.
-
Better prediction of functional effects for sequence variants.BMC Genomics. 2015;16 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2164-16-S8-S1. Epub 2015 Jun 18. BMC Genomics. 2015. PMID: 26110438 Free PMC article.
-
LocTree2 predicts localization for all domains of life.Bioinformatics. 2012 Sep 15;28(18):i458-i465. doi: 10.1093/bioinformatics/bts390. Bioinformatics. 2012. PMID: 22962467 Free PMC article.
-
Functional region prediction with a set of appropriate homologous sequences--an index for sequence selection by integrating structure and sequence information with spatial statistics.BMC Struct Biol. 2012 May 29;12:11. doi: 10.1186/1472-6807-12-11. BMC Struct Biol. 2012. PMID: 22643026 Free PMC article.
-
Conservation and relative importance of residues across protein-protein interfaces.Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15447-52. doi: 10.1073/pnas.0505425102. Epub 2005 Oct 12. Proc Natl Acad Sci U S A. 2005. PMID: 16221766 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources