Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Feb;11(2):350-60.
doi: 10.1110/ps.18602.

Persistently conserved positions in structurally similar, sequence dissimilar proteins: roles in preserving protein fold and function

Affiliations

Persistently conserved positions in structurally similar, sequence dissimilar proteins: roles in preserving protein fold and function

Iddo Friedberg et al. Protein Sci. 2002 Feb.

Abstract

Many protein pairs that share the same fold do not have any detectable sequence similarity, providing a valuable source of information for studying sequence-structure relationship. In this study, we use a stringent data set of structurally similar, sequence-dissimilar protein pairs to characterize residues that may play a role in the determination of protein structure and/or function. For each protein in the database, we identify amino-acid positions that show residue conservation within both close and distant family members. These positions are termed "persistently conserved". We then proceed to determine the "mutually" persistently conserved (MPC) positions: those structurally aligned positions in a protein pair that are persistently conserved in both pair mates. Because of their intra- and interfamily conservation, these positions are good candidates for determining protein fold and function. We find that 45% of the persistently conserved positions are mutually conserved. A significant fraction of them are located in critical positions for secondary structure determination, they are mostly buried, and many of them form spatial clusters within their protein structures. A substitution matrix based on the subset of MPC positions shows two distinct characteristics: (i) it is different from other available matrices, even those that are derived from structural alignments; (ii) its relative entropy is high, emphasizing the special residue restrictions imposed on these positions. Such a substitution matrix should be valuable for protein design experiments.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
A schematic flowchart describing the identification of mutually persistently conserved positions. See text for details.
Fig. 2.
Fig. 2.
Distribution of residue types in mutually persistently conserved (MPC) positions expressed as the log-odds ratio between the frequency of a residue in MPC positions (obs) and its frequency in the entire database of SSSD proteins (exp). All frequency differences were found to be statistically significant by a κ2 test, except for Leucine, Asparagine, and Valine (marked with ‘∧’).
Fig. 3.
Fig. 3.
Amino-acid residue substitution matrices derived from (a) mutually persistently conserved positions and (b) all structurally aligned positions. Values are scaled to 1/10 bit.
Fig. 3.
Fig. 3.
Amino-acid residue substitution matrices derived from (a) mutually persistently conserved positions and (b) all structurally aligned positions. Values are scaled to 1/10 bit.
Fig. 4.
Fig. 4.
Comparison between sequence-derived and structure-derived substitution matrices. The amino-acid pair frequency distributions that were used for the derivation of the substitution matrices were compared by the Jensen-Shannon divergence. A series of BLOSUM matrices were compared with the mutually persistently conserved-derived matrix (filled squares) and with the structurally derived matrix (open circles).
Fig. 5.
Fig. 5.
Frequency of mutually persistently conserved (MPC) positions in secondary structure elements. The X-axis shows the positions in and flanking the secondary structure element (nomenclature after Aurora and Rose 1998). The flanking regions are marked with apostrophes, the in-element residues with digits, and the initial and terminal (capping) residues with a "c." The Y-axis is the logarithm of the ratio between the actual frequency of MPC residues in a position and that expected at random, based on the overall frequency of MPC positions in the data. The positions in which MPCs were found to be significantly over- or under-represented are marked with an "*." (a) α helices; (b) β strands.
Fig. 6.
Fig. 6.
Distribution of mutually persistently conserved positions (white bars) by solvent accessibility compared to all aligned residues (black bars). Residues were defined as buried when the solvent accessibility was <30% and exposed otherwise.

Similar articles

Cited by

References

    1. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25 3389–3402. - PMC - PubMed
    1. Aurora, R. and Rose, G.D. 1998. Helix capping. Protein Sci. 7 21–38. - PMC - PubMed
    1. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The protein data bank. Nucleic Acids Res. 28 235–242. - PMC - PubMed
    1. Blake, J.D. and Cohen, F.E. 2001. Pairwise sequence alignment below the twilight zone. J. Mol. Biol. 307 721–735. - PubMed
    1. Bowie, J.U., Reidhaar-Olson, J.F., Lim, W.A., and Sauer, R.T. 1990. Deciphering the message in protein sequences: Tolerance to amino acid substitutions. Science 247 1306–1310. - PubMed

Publication types

LinkOut - more resources