The compositional adjustment of amino acid substitution matrices
- PMID: 14663142
- PMCID: PMC307629
- DOI: 10.1073/pnas.2533904100
The compositional adjustment of amino acid substitution matrices
Abstract
Amino acid substitution matrices are central to protein-comparison methods. In most commonly used matrices, the substitution scores take a log-odds form, involving the ratio of "target" to "background" frequencies derived from large, carefully curated sets of protein alignments. However, such matrices often are used to compare protein sequences with amino acid compositions that differ markedly from the background frequencies used for the construction of the matrices. Of course, the target frequencies should be adjusted in such cases, but the lack of an appropriate way to do this has been a long-standing problem. This article shows that if one demands consistency between target and background frequencies, then a log-odds substitution matrix implies a unique set of target and background frequencies as well as a unique scale. Standard substitution matrices therefore are truly appropriate only for the comparison of proteins with standard amino acid composition. Accordingly, we present and evaluate a rationale for transforming the target frequencies implicit in a standard matrix to frequencies appropriate for a nonstandard context. This rationale yields asymmetric matrices for the comparison of proteins with divergent compositions. Earlier approaches are unable to deal with this case in a fully consistent manner. Composition-specific substitution matrix adjustment is shown to be of utility for comparing compositionally biased proteins, including those of organisms with nucleotide-biased, and therefore codon-biased, genomes or isochores.
Figures

Similar articles
-
Genome bias influences amino acid choices: analysis of amino acid substitution and re-compilation of substitution matrices exclusive to an AT-biased genome.Nucleic Acids Res. 2008 Dec;36(21):6664-75. doi: 10.1093/nar/gkn635. Epub 2008 Oct 23. Nucleic Acids Res. 2008. PMID: 18948281 Free PMC article.
-
The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions.Bioinformatics. 2005 Apr 1;21(7):902-11. doi: 10.1093/bioinformatics/bti070. Epub 2004 Oct 27. Bioinformatics. 2005. PMID: 15509610
-
Protein database searches using compositionally adjusted substitution matrices.FEBS J. 2005 Oct;272(20):5101-9. doi: 10.1111/j.1742-4658.2005.04945.x. FEBS J. 2005. PMID: 16218944 Free PMC article. Review.
-
Solvent accessibility, residue charge and residue volume, the three ingredients of a robust amino acid substitution matrix.J Theor Biol. 2007 Apr 21;245(4):715-25. doi: 10.1016/j.jtbi.2006.12.014. Epub 2006 Dec 19. J Theor Biol. 2007. PMID: 17240399
-
Substitution scoring matrices for proteins - An overview.Protein Sci. 2020 Nov;29(11):2150-2163. doi: 10.1002/pro.3954. Epub 2020 Oct 12. Protein Sci. 2020. PMID: 32954566 Free PMC article. Review.
Cited by
-
Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations.Nat Biomed Eng. 2021 Jun;5(6):613-623. doi: 10.1038/s41551-021-00689-x. Epub 2021 Mar 11. Nat Biomed Eng. 2021. PMID: 33707779
-
Sequence context-specific profiles for homology searching.Proc Natl Acad Sci U S A. 2009 Mar 10;106(10):3770-5. doi: 10.1073/pnas.0810767106. Epub 2009 Feb 20. Proc Natl Acad Sci U S A. 2009. PMID: 19234132 Free PMC article.
-
Pattern of amino acid substitutions in transmembrane domains of β-barrel membrane proteins for detecting remote homologs in bacteria and mitochondria.PLoS One. 2011;6(11):e26400. doi: 10.1371/journal.pone.0026400. Epub 2011 Nov 1. PLoS One. 2011. PMID: 22069449 Free PMC article.
-
Splitting the BLOSUM score into numbers of biological significance.EURASIP J Bioinform Syst Biol. 2007;2007(1):31450. doi: 10.1155/2007/31450. EURASIP J Bioinform Syst Biol. 2007. PMID: 18369412 Free PMC article.
-
New amino acid substitution matrix brings sequence alignments into agreement with structure matches.Proteins. 2021 Jun;89(6):671-682. doi: 10.1002/prot.26050. Epub 2021 Feb 2. Proteins. 2021. PMID: 33469973 Free PMC article.
References
-
- Dayhoff, M. O., Schwartz, R. M. & Orcutt, B. C. (1978) in Atlas of Protein Sequence and Structure, ed. Dayhoff, M. O. (National Biomedical Research Foundation, Washington, DC), Vol. 5, Suppl. 3, pp. 345–352.
-
- Schwartz, R. M. & Dayhoff, M. O. (1978) in Atlas of Protein Sequence and Structure, ed. Dayhoff, M. O. (National Biomedical Research Foundation, Washington, DC), Vol. 5, Suppl. 3, pp. 353–358.
-
- Gonnet, G. H., Cohen, M. A. & Benner, S. A. (1992) Science 256, 1443–1445. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources