Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(12):e28206.
doi: 10.1371/journal.pone.0028206. Epub 2011 Dec 5.

Structural position correlation analysis (SPCA) for protein family

Affiliations

Structural position correlation analysis (SPCA) for protein family

Qi-Shi Du et al. PLoS One. 2011.

Abstract

Background: The proteins in a family, which perform the similar biological functions, may have very different amino acid composition, but they must share the similar 3D structures, and keep a stable central region. In the conservative structure region similar biological functions are performed by two or three catalytic residues with the collaboration of several functional residues at key positions. Communication signals are conducted in a position network, adjusting the biological functions in the protein family.

Methodology: A computational approach, namely structural position correlation analysis (SPCA), is developed to analyze the correlation relationship between structural segments (or positions). The basic hypothesis of SPCA is that in a protein family the structural conservation is more important than the sequence conservation, and the local structural changes may contain information of biology functional evolution. A standard protein P(0) is defined in a protein family, which consists of the most-frequent amino acids and takes the average structure of the protein family. The foundational variables of SPCA is the structural position displacements between the standard protein P(0) and individual proteins P(i) of the family. The structural positions are organized as segments, which are the stable units in structural displacements of the protein family. The biological function differences of protein members are determined by the position structural displacements of individual protein P(i) to the standard protein P(0). Correlation analysis is used to analyze the communication network among segments.

Conclusions: The structural position correlation analysis (SPCA) is able to find the correlation relationship among the structural segments (or positions) in a protein family, which cannot be detected by the amino acid sequence and frequency-based methods. The functional communication network among the structural segments (or positions) in protein family, revealed by SPCA approach, well illustrate the distantly allosteric interactions, and contains valuable information for protein engineering study.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Structure of PDZ domain 1BE9 and multiple structural alignment (MSA) of 186 PDZ domains.
(A) The structure of PDZ domain 1BE9 and peptide ligand. Target C-terminal ligands bind in a surface groove formed between the α2 helix and the β2 strand at a number of binding sites that determine both ligand affinity and sequence specific recognition. Blue is for hydrophilic surface and green for hydrophobic surface. (B) The multiple structural alignment (MSA) database of 186 PDZ crystal structures. PDZ domains consist of 90–100 residues that adopt a six-stranded β sandwich configuration with two flanking α helices. In the MSA database there are 117 residue positions, including gaps inserted in structural alignment. After deletion of unnecessary gaps, the length of MSA database is 96 positions. (C) The locations of 6 structural segments and the secondary structural units of PDZ protein domains. The four PDZ protein domains (2QKT, 2F5Y, 1G9O, and 1BE9) are taken from the MSA database of 186 PDZ domains. The six structural segments (S1 to S6) are indicated by green frameworks, and the secondary structural units (α-helices, β-strands, and loops) are indicated by color bars (blue for loops, yellow for β-strands, and red for α-helices). The structural segments are stable units in the structural changes of protein family.
Figure 2
Figure 2. The most frequent amino acids at sequence positions and the average position displacements between the standard protein and the proteins of PDZ domains.
(A) The percent frequencies of the most frequent amino acids at sequence positions of the MSA PDZ domains database. The higher frequency means the higher conservation and the lower frequency means the higher mutation of amino acids at the sequence positions. (B) The average structural displacement between standard protein P(0) and the proteins of PDZ domain database. The higher displacement represents the larger structural change at the positions, and the lower displacement indicates the stable positions in structure. Partially complementary relationship between the amino acid frequencies and the structural displacement is found: the higher amino acid frequency, the lower position displacement.
Figure 3
Figure 3. The displacement correlation relationships between structural segments and positions.
(A) The displacement correlation between segments S2 (in β2) and S5 (in α2). The correlation of S2 and S5, actually, represents the structural correlation between α2 and β2. (B) The displacement correlation between position 37 (in β3) and position 78 (α2). The correlation of positions 37 and 78 causes a distant allosteric interaction in the PDZ domain.
Figure 4
Figure 4. Information for PDZ protein domain from the SPCA calculation.
(A) The residues at the controlling positions for ligand affinity. The size of Tyr79 and Leu81 of 2QKT (blue) are much larger than the Ala76 and Ala78 of 1BE9 (green). (B) The disulfide bond between Cys37 in β3 and Cys78 in α2 of 2QKT. The interaction between positions 37 and 78 indirectly conducts the controlling signal to the ligand preference of binding location in α2-β2 groove.
Figure 5
Figure 5. The flowchart of structural position correlation analysis (SPCA).
The displacement matrix D(α) N×L and D(m) N×L is the distant differences between standard protein P(0) and proteins of protein evolutionary family. The superscripts ‘α’ and ‘m’ indicate the α-carbon and mass center, respectively. From the statistical correlation analysis to the residue position displacements D(α) N×L, the residue positions are reorganized as structural segments. Then statistical correlation analysis is applied to the structural segment displacement matrix D(s) N×K, revealing the segment correlation information of functional evolution in the protein family.

Similar articles

References

    1. Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol. 1996;257:342–358. - PubMed
    1. Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science. 1999;286:295–299. - PubMed
    1. Cover TM, Thomas JA. Elements of information theory. Wiley-Interscience, New York; 2006.
    1. Socolich M, Lockless SW, Russ WP, Lee H, Gardner KH, et al. Evolutionary information for specifying a protein fold. Nature. 2005;437:512–518. - PubMed
    1. Kaczanowski S, Zielenkiewicz P. Why similar protein sequences encode similar three-dimensional structures?. Theor Chem Acc. 2010;125:643–650.

Publication types