A data bank merging related protein structures and sequences
- PMID: 1594567
- DOI: 10.1093/protein/5.2.121
A data bank merging related protein structures and sequences
Abstract
A data collection which merges protein structural and sequence information is described. Structural superpositions amongst proteins with similar main-chain fold were performed or collected from the literature. Sequences taken from the protein primary structure databases were associated with the multiple structural alignments providing they were at least 50% homologous in residue identity to one of the structural sequences and at least 50% of the structural sequence residues were alignable. Such restrictions allow reasonable confidence that the primary sequences share the conformation of the tertiary structural templates, except in the less conserved loop regions. Multiple structural superpositions were collected for 38 familial groups containing a total of 209 tertiary structures; 45 structures had no superposable mates and were used individually. Other information is also provided as main-chain and side-chain conformational angles, secondary structural assignments and the like. Wedding the primary and tertiary structural data resulted in an 8-fold increase of data bank sequence entries over those associated with the known three-dimensional architectures alone.
Similar articles
-
Identification and classification of protein fold families.Protein Eng. 1993 Jul;6(5):485-500. doi: 10.1093/protein/6.5.485. Protein Eng. 1993. PMID: 8415576
-
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975. J Mol Biol. 2000. PMID: 10966778
-
Alignment and searching for common protein folds using a data bank of structural templates.J Mol Biol. 1993 Jun 5;231(3):735-52. doi: 10.1006/jmbi.1993.1323. J Mol Biol. 1993. PMID: 8515448
-
Multiple sequence alignments.Curr Opin Struct Biol. 2005 Jun;15(3):261-6. doi: 10.1016/j.sbi.2005.04.002. Curr Opin Struct Biol. 2005. PMID: 15963889 Review.
-
Protein sequence motifs.Curr Opin Struct Biol. 1996 Jun;6(3):366-76. doi: 10.1016/s0959-440x(96)80057-1. Curr Opin Struct Biol. 1996. PMID: 8804823 Review.
Cited by
-
De novo and inverse folding predictions of protein structure and dynamics.J Comput Aided Mol Des. 1993 Aug;7(4):397-438. doi: 10.1007/BF02337559. J Comput Aided Mol Des. 1993. PMID: 8229093 Review.
-
The EMBL Nucleotide Sequence Database.Nucleic Acids Res. 1997 Jan 1;25(1):7-14. doi: 10.1093/nar/25.1.7. Nucleic Acids Res. 1997. PMID: 9016493 Free PMC article.
-
SAGA: sequence alignment by genetic algorithm.Nucleic Acids Res. 1996 Apr 15;24(8):1515-24. doi: 10.1093/nar/24.8.1515. Nucleic Acids Res. 1996. PMID: 8628686 Free PMC article.
-
Comparative genomic analysis and phylogeny of NAC25 gene from cultivated and wild Coffea species.Front Plant Sci. 2022 Sep 16;13:1009733. doi: 10.3389/fpls.2022.1009733. eCollection 2022. Front Plant Sci. 2022. PMID: 36186041 Free PMC article.
-
The European Bioinformatics Institute (EBI) databases.Nucleic Acids Res. 1996 Jan 1;24(1):6-12. doi: 10.1093/nar/24.1.6. Nucleic Acids Res. 1996. PMID: 8594602 Free PMC article.