Incorporating homologues into sequence embeddings for protein analysis
- PMID: 17688313
- DOI: 10.1142/s0219720007002734
Incorporating homologues into sequence embeddings for protein analysis
Abstract
Statistical and learning techniques are becoming increasingly popular for different tasks in bioinformatics. Many of the most powerful statistical and learning techniques are applicable to points in a Euclidean space but not directly applicable to discrete sequences such as protein sequences. One way to apply these techniques to protein sequences is to embed the sequences into a Euclidean space and then apply these techniques to the embedded points. In this work we introduce a biologically motivated sequence embedding, the homology kernel, which takes into account intuitions from local alignment, sequence homology, and predicted secondary structure. This embedding allows us to directly apply learning techniques to protein sequences. We apply the homology kernel in several ways. We demonstrate how the homology kernel can be used for protein family classification and outperforms state-of-the-art methods for remote homology detection. We show that the homology kernel can be used for secondary structure prediction and is competitive with popular secondary structure prediction methods. Finally, we show how the homology kernel can be used to incorporate information from homologous sequences in local sequence alignment.
Similar articles
-
Iterative sequence/secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases.Bioinformatics. 2000 Nov;16(11):988-1002. doi: 10.1093/bioinformatics/16.11.988. Bioinformatics. 2000. PMID: 11159310
-
Word correlation matrices for protein sequence analysis and remote homology detection.BMC Bioinformatics. 2008 Jun 3;9:259. doi: 10.1186/1471-2105-9-259. BMC Bioinformatics. 2008. PMID: 18522726 Free PMC article.
-
Optimal pairwise alignment of fixed protein structures in subquadratic time.J Bioinform Comput Biol. 2011 Jun;9(3):367-82. doi: 10.1142/s0219720011005562. J Bioinform Comput Biol. 2011. PMID: 21714130
-
Protein secondary structure prediction.Methods Mol Biol. 2010;609:327-48. doi: 10.1007/978-1-60327-241-4_19. Methods Mol Biol. 2010. PMID: 20221928 Review.
-
Integrating protein secondary structure prediction and multiple sequence alignment.Curr Protein Pept Sci. 2004 Aug;5(4):249-66. doi: 10.2174/1389203043379675. Curr Protein Pept Sci. 2004. PMID: 15320732 Review.
Cited by
-
Protein threading using context-specific alignment potential.Bioinformatics. 2013 Jul 1;29(13):i257-65. doi: 10.1093/bioinformatics/btt210. Bioinformatics. 2013. PMID: 23812991 Free PMC article.
-
A conditional neural fields model for protein threading.Bioinformatics. 2012 Jun 15;28(12):i59-66. doi: 10.1093/bioinformatics/bts213. Bioinformatics. 2012. PMID: 22689779 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources