Optimizing substitution matrices by separating score distributions
- PMID: 14752003
- DOI: 10.1093/bioinformatics/btg494
Optimizing substitution matrices by separating score distributions
Abstract
Motivation: Homology search is one of the most fundamental tools in Bioinformatics. Typical alignment algorithms use substitution matrices and gap costs. Thus, the improvement of substitution matrices increases accuracy of homology searches. Generally, substitution matrices are derived from aligned sequences whose relationships are known, and gap costs are determined by trial and error. To discriminate relationships more clearly, we are encouraged to optimize the substitution matrices from statistical viewpoints using both positive and negative examples utilizing Bayesian decision theory.
Results: Using Cluster of Orthologous Group (COG) database, we optimized substitution matrices. The classification accuracy of the obtained matrix is better than that of conventional substitution matrices to COG database. It also achieves good performance in classifying with other databases.
Similar articles
-
Optimizing amino acid substitution matrices with a local alignment kernel.BMC Bioinformatics. 2006 May 5;7:246. doi: 10.1186/1471-2105-7-246. BMC Bioinformatics. 2006. PMID: 16677385 Free PMC article.
-
Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions.Bioinformatics. 2006 Feb 15;22(4):413-22. doi: 10.1093/bioinformatics/bti828. Epub 2005 Dec 13. Bioinformatics. 2006. PMID: 16352653
-
A metric model of amino acid substitution.Bioinformatics. 2004 May 22;20(8):1214-21. doi: 10.1093/bioinformatics/bth065. Epub 2004 Feb 10. Bioinformatics. 2004. PMID: 14871874
-
Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment.Int J Comput Biol Drug Des. 2008;1(4):347-67. doi: 10.1504/ijcbdd.2008.022207. Int J Comput Biol Drug Des. 2008. PMID: 20063463 Review.
-
Protein database searches using compositionally adjusted substitution matrices.FEBS J. 2005 Oct;272(20):5101-9. doi: 10.1111/j.1742-4658.2005.04945.x. FEBS J. 2005. PMID: 16218944 Free PMC article. Review.
Cited by
-
A weighted string kernel for protein fold recognition.BMC Bioinformatics. 2017 Aug 25;18(1):378. doi: 10.1186/s12859-017-1795-5. BMC Bioinformatics. 2017. PMID: 28841820 Free PMC article.
-
Optimizing amino acid substitution matrices with a local alignment kernel.BMC Bioinformatics. 2006 May 5;7:246. doi: 10.1186/1471-2105-7-246. BMC Bioinformatics. 2006. PMID: 16677385 Free PMC article.
-
Revisiting amino acid substitution matrices for identifying distantly related proteins.Bioinformatics. 2014 Feb 1;30(3):317-25. doi: 10.1093/bioinformatics/btt694. Epub 2013 Nov 26. Bioinformatics. 2014. PMID: 24281694 Free PMC article.
-
Protein sequence-similarity search acceleration using a heuristic algorithm with a sensitive matrix.J Struct Funct Genomics. 2016 Dec;17(4):147-154. doi: 10.1007/s10969-016-9210-4. Epub 2017 Jan 12. J Struct Funct Genomics. 2016. PMID: 28083762 Free PMC article.
-
UProC: tools for ultra-fast protein domain classification.Bioinformatics. 2015 May 1;31(9):1382-8. doi: 10.1093/bioinformatics/btu843. Epub 2014 Dec 23. Bioinformatics. 2015. PMID: 25540185 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous