Recognition of distantly related protein sequences using conserved motifs and neural networks
- PMID: 1469726
- DOI: 10.1016/0022-2836(92)90877-m
Recognition of distantly related protein sequences using conserved motifs and neural networks
Abstract
A sensitive technique for protein sequence motif recognition based on neural networks has been developed. It involves three major steps. (1) At each appropriate alignment position of a set of N matched sequences, a set of N aligned oligopeptides is specified with preselected window length. N neural nets are subsequently and successively trained on N-1 amino acid spans after eliminating each ith oligopeptide. A test for recognition of each of the ith spans is performed. The average neural net recognition over N such trials is used as a measure of conservation for the particular windowed region of the multiple alignment. This process is repeated for all possible spans of given length in the multiple alignment. (2) The M most conserved regions are regarded as motifs and the oligopeptides within each are used to train intensively M individual neural networks. (3) The M networks are then applied in a search for related primary structures in a databank of known protein sequences. The oligopeptide spans in the database sequence with strongest neural net output for each of the M networks are saved and then scored according to the output signals and the proper combination that follows the expected N- to C-terminal sequence order. The motifs from the database with highest similarity scores can then be used to retrain the M neural nets, which can be subsequently utilized for further searches in the databank, thus providing even greater sensitivity to recognize distant familial proteins. This technique was successfully applied to the integrase, DNA-polymerase and immunoglobulin families.
Similar articles
-
An integrated approach to the analysis and modeling of protein sequences and structures. III. A comparative study of sequence conservation in protein structural families using multiple structural alignments.J Mol Biol. 2000 Aug 18;301(3):691-711. doi: 10.1006/jmbi.2000.3975. J Mol Biol. 2000. PMID: 10966778
-
Improved sensitivity of profile searches through the use of sequence weights and gap excision.Comput Appl Biosci. 1994 Feb;10(1):19-29. doi: 10.1093/bioinformatics/10.1.19. Comput Appl Biosci. 1994. PMID: 8193951
-
Kohonen map as a visualization tool for the analysis of protein sequences: multiple alignments, domains and segments of secondary structures.Comput Appl Biosci. 1996 Dec;12(6):447-54. doi: 10.1093/bioinformatics/12.6.447. Comput Appl Biosci. 1996. PMID: 9021261
-
Identifying distantly related protein sequences.Comput Appl Biosci. 1997 Aug;13(4):325-32. doi: 10.1093/bioinformatics/13.4.325. Comput Appl Biosci. 1997. PMID: 9283747 Review. No abstract available.
-
Protein sequence motifs.Curr Opin Struct Biol. 1996 Jun;6(3):366-76. doi: 10.1016/s0959-440x(96)80057-1. Curr Opin Struct Biol. 1996. PMID: 8804823 Review.
Cited by
-
Self-organizing hierarchic networks for pattern recognition in protein sequence.Protein Sci. 1996 Jan;5(1):72-82. doi: 10.1002/pro.5560050109. Protein Sci. 1996. PMID: 8771198 Free PMC article.
-
SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence.Nucleic Acids Res. 2003 Jul 1;31(13):3692-7. doi: 10.1093/nar/gkg600. Nucleic Acids Res. 2003. PMID: 12824396 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources