Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Dec 23;100(26):15688-93.
doi: 10.1073/pnas.2533904100. Epub 2003 Dec 8.

The compositional adjustment of amino acid substitution matrices

Affiliations

The compositional adjustment of amino acid substitution matrices

Yi-Kuo Yu et al. Proc Natl Acad Sci U S A. .

Abstract

Amino acid substitution matrices are central to protein-comparison methods. In most commonly used matrices, the substitution scores take a log-odds form, involving the ratio of "target" to "background" frequencies derived from large, carefully curated sets of protein alignments. However, such matrices often are used to compare protein sequences with amino acid compositions that differ markedly from the background frequencies used for the construction of the matrices. Of course, the target frequencies should be adjusted in such cases, but the lack of an appropriate way to do this has been a long-standing problem. This article shows that if one demands consistency between target and background frequencies, then a log-odds substitution matrix implies a unique set of target and background frequencies as well as a unique scale. Standard substitution matrices therefore are truly appropriate only for the comparison of proteins with standard amino acid composition. Accordingly, we present and evaluate a rationale for transforming the target frequencies implicit in a standard matrix to frequencies appropriate for a nonstandard context. This rationale yields asymmetric matrices for the comparison of proteins with divergent compositions. Earlier approaches are unable to deal with this case in a fully consistent manner. Composition-specific substitution matrix adjustment is shown to be of utility for comparing compositionally biased proteins, including those of organisms with nucleotide-biased, and therefore codon-biased, genomes or isochores.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Example of an alignment extension yielded by compositional adjustment of the scoring system. The sequences compared are P. falciparum putative asparagine synthase (NCBI gi 16805184) (top lines) and M. tuberculosis PurF protein (NCBI gi 15607948) (bottom lines). In the central lines, aligned identical residues are echoed, and aligned residues with positive substitution score are indicated by + symbols. (a) The alignment yielded by a scaled version of the standard BLOSUM-62 substitution matrix (see * footnote in Table 1). The alignment has a normalized score of 29.7 bits. (b) The alignment yielded by a composition-adjusted matrix derived from BLOSUM-62 (see * and ‡ footnotes in Table 1). The normalized score of the alignment is 31.8 bits. The alignment in b corresponds very closely to the three-dimensional structural superposition of the entire domain fold (NCBI CDD 9909, COG 0034) that is shared between the PurF and asparagine synthase families. Secondary structure elements were assigned by using the known crystal structures of E. coli asparagine synthetase B (PDB ID 1CT9 chain A) and B. subtilis PurF protein (PDB ID 1GPH chain 3). β-strands (straight bars) and α-helices (zig-zags) are indicated above and below their respective homologous sequences.

Similar articles

Cited by

References

    1. Karlin, S. & Altschul, S. F. (1990) Proc. Natl. Acad. Sci. USA 87, 2264–2268. - PMC - PubMed
    1. Altschul, S. F. (1991) J. Mol. Biol. 219, 555–565. - PMC - PubMed
    1. Dayhoff, M. O., Schwartz, R. M. & Orcutt, B. C. (1978) in Atlas of Protein Sequence and Structure, ed. Dayhoff, M. O. (National Biomedical Research Foundation, Washington, DC), Vol. 5, Suppl. 3, pp. 345–352.
    1. Schwartz, R. M. & Dayhoff, M. O. (1978) in Atlas of Protein Sequence and Structure, ed. Dayhoff, M. O. (National Biomedical Research Foundation, Washington, DC), Vol. 5, Suppl. 3, pp. 353–358.
    1. Gonnet, G. H., Cohen, M. A. & Benner, S. A. (1992) Science 256, 1443–1445. - PubMed

Publication types

LinkOut - more resources