Optimizing amino acid substitution matrices with a local alignment kernel
- PMID: 16677385
- PMCID: PMC1513605
- DOI: 10.1186/1471-2105-7-246
Optimizing amino acid substitution matrices with a local alignment kernel
Abstract
Background: Detecting remote homologies by direct comparison of protein sequences remains a challenging task. We had previously developed a similarity score between sequences, called a local alignment kernel, that exhibits good performance for this task in combination with a support vector machine. The local alignment kernel depends on an amino acid substitution matrix. Since commonly used BLOSUM or PAM matrices for scoring amino acid matches have been optimized to be used in combination with the Smith-Waterman algorithm, the matrices optimal for the local alignment kernel can be different.
Results: Contrary to the local alignment score computed by the Smith-Waterman algorithm, the local alignment kernel is differentiable with respect to the amino acid substitution and its derivative can be computed efficiently by dynamic programming. We optimized the substitution matrix by classical gradient descent by setting an objective function that measures how well the local alignment kernel discriminates homologs from non-homologs in the COG database. The local alignment kernel exhibits better performance when it uses the matrices and gap parameters optimized by this procedure than when it uses the matrices optimized for the Smith-Waterman algorithm. Furthermore, the matrices and gap parameters optimized for the local alignment kernel can also be used successfully by the Smith-Waterman algorithm.
Conclusion: This optimization procedure leads to useful substitution matrices, both for the local alignment kernel and the Smith-Waterman algorithm. The best performance for homology detection is obtained by the local alignment kernel.
Figures








Similar articles
-
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix.BMC Bioinformatics. 2015 Aug 14;16:255. doi: 10.1186/s12859-015-0688-8. BMC Bioinformatics. 2015. PMID: 26269100 Free PMC article.
-
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection.Bioinformatics. 2008 May 15;24(10):1264-70. doi: 10.1093/bioinformatics/btn112. Epub 2008 Mar 31. Bioinformatics. 2008. PMID: 18378524
-
Optimizing substitution matrices by separating score distributions.Bioinformatics. 2004 Apr 12;20(6):863-73. doi: 10.1093/bioinformatics/btg494. Epub 2004 Jan 29. Bioinformatics. 2004. PMID: 14752003
-
Substitution scoring matrices for proteins - An overview.Protein Sci. 2020 Nov;29(11):2150-2163. doi: 10.1002/pro.3954. Epub 2020 Oct 12. Protein Sci. 2020. PMID: 32954566 Free PMC article. Review.
-
Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment.Int J Comput Biol Drug Des. 2008;1(4):347-67. doi: 10.1504/ijcbdd.2008.022207. Int J Comput Biol Drug Des. 2008. PMID: 20063463 Review.
Cited by
-
High-performance integrated virtual environment (HIVE): a robust infrastructure for next-generation sequence data analysis.Database (Oxford). 2016 Mar 17;2016:baw022. doi: 10.1093/database/baw022. Print 2016. Database (Oxford). 2016. PMID: 26989153 Free PMC article.
-
End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman.Bioinformatics. 2023 Jan 1;39(1):btac724. doi: 10.1093/bioinformatics/btac724. Bioinformatics. 2023. PMID: 36355460 Free PMC article.
-
Developing similarity matrices for antibody-protein binding interactions.PLoS One. 2023 Oct 26;18(10):e0293606. doi: 10.1371/journal.pone.0293606. eCollection 2023. PLoS One. 2023. PMID: 37883504 Free PMC article.
-
Predicting DNA-binding specificities of eukaryotic transcription factors.PLoS One. 2010 Nov 30;5(11):e13876. doi: 10.1371/journal.pone.0013876. PLoS One. 2010. PMID: 21152420 Free PMC article.
-
SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment.PeerJ. 2017 Jun 27;5:e3492. doi: 10.7717/peerj.3492. eCollection 2017. PeerJ. 2017. PMID: 28674656 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous