Automatic generation of primary sequence patterns from sets of related protein sequences
- PMID: 2296575
- PMCID: PMC53211
- DOI: 10.1073/pnas.87.1.118
Automatic generation of primary sequence patterns from sets of related protein sequences
Abstract
We have developed a computer algorithm that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family. The method involves clustering the pairwise similarity scores among a set of related sequences to generate a binary dendrogram (tree). The tree is then reduced in a stepwise manner by progressively replacing the node connecting the two most similar termini by one common pattern until only a single common "root" pattern remains. A pattern is generated at a node by (i) performing a local optimal alignment on the sequence/pattern pair connected by the node with the use of an extended dynamic programming algorithm and then (ii) constructing a single common pattern from this alignment with a nested hierarchy of amino acid classes to identify the minimal inclusive amino acid class covering each paired set of elements in the alignment. Gaps within an alignment are created and/or extended using a "pay once" gap penalty rule, and gapped positions are converted into gap characters that function as 0 or 1 amino acid of any type during subsequent alignment. This method has been used to generate a library of covering patterns for homologous families in the National Biomedical Research Foundation/Protein Identification Resource protein sequence data base. We show that a covering pattern can be more diagnostic for sequence family membership than any of the individual sequences used to construct the pattern.
Similar articles
-
Hierarchical method to align large numbers of biological sequences.Methods Enzymol. 1990;183:456-74. doi: 10.1016/0076-6879(90)83031-4. Methods Enzymol. 1990. PMID: 2156130
-
A novel randomized iterative strategy for aligning multiple protein sequences.Comput Appl Biosci. 1991 Oct;7(4):479-84. doi: 10.1093/bioinformatics/7.4.479. Comput Appl Biosci. 1991. PMID: 1747779
-
A non-local gap-penalty for profile alignment.Bull Math Biol. 1996 Jan;58(1):1-18. doi: 10.1007/BF02458279. Bull Math Biol. 1996. PMID: 8819751
-
Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.BMC Bioinformatics. 2004 Oct 25;5:157. doi: 10.1186/1471-2105-5-157. BMC Bioinformatics. 2004. PMID: 15504234 Free PMC article.
-
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975. J Mol Biol. 2000. PMID: 10966778
Cited by
-
Automated assembly of protein blocks for database searching.Nucleic Acids Res. 1991 Dec 11;19(23):6565-72. doi: 10.1093/nar/19.23.6565. Nucleic Acids Res. 1991. PMID: 1754394 Free PMC article.
-
Interspecific comparison of the unusually repetitive Drosophila locus mastermind.J Mol Evol. 1991 May;32(5):415-20. doi: 10.1007/BF02101281. J Mol Evol. 1991. PMID: 1904096
-
A screen for modifiers of decapentaplegic mutant phenotypes identifies lilliputian, the only member of the Fragile-X/Burkitt's Lymphoma family of transcription factors in Drosophila melanogaster.Genetics. 2001 Feb;157(2):717-25. doi: 10.1093/genetics/157.2.717. Genetics. 2001. PMID: 11156991 Free PMC article.
-
alpha-Hemolysin, gamma-hemolysin, and leukocidin from Staphylococcus aureus: distant in sequence but similar in structure.Protein Sci. 1997 Dec;6(12):2631-5. doi: 10.1002/pro.5560061216. Protein Sci. 1997. PMID: 9416613 Free PMC article.
-
Immunological and virological analyses of persons infected by human immunodeficiency virus type 1 while participating in trials of recombinant gp120 subunit vaccines.J Virol. 1998 Feb;72(2):1552-76. doi: 10.1128/JVI.72.2.1552-1576.1998. J Virol. 1998. PMID: 9445059 Free PMC article. Clinical Trial.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous