Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Feb 27;11(2):e1004091.
doi: 10.1371/journal.pcbi.1004091. eCollection 2015 Feb.

Protein sectors: statistical coupling analysis versus conservation

Affiliations

Protein sectors: statistical coupling analysis versus conservation

Tiberiu Teşileanu et al. PLoS Comput Biol. .

Abstract

Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed "sectors". The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Relation between the components of the top eigenvector of the SCA matrix and its diagonal elements, or conservation.
A. Comparison to the square root of the diagonal elements of the SCA matrix. B. Comparison to conservation. This was obtained for the PDZ alignment, but the results are similar for other alignments.
Fig 2
Fig 2. Contingency tables testing whether a PSD95pdz3 residue belonging to a sector or being highly-conserved is associated with significant functional effect upon mutagenesis.
The tables are identical although only 57% of the residues are shared between the sector and the conserved positions.
Fig 3
Fig 3. Histograms showing the effect of mutations on binding affinity of PSD95pdz3 with cognate ligand, for all mutations (gray), and for mutations to selected positions (black).
Each of the histograms in black contains 21 positions, with A. the largest SCA scores, or B. the largest conservation levels. A Mann-Whitney U test cannot find a statistically-significant difference between the distribution of mutational effects for sector positions and the one for conserved positions (p = 0.9). The mutational effect ΔEixx is a dimensionless quantity calculated as in McLaughlin Jr. et al. [25].
Fig 4
Fig 4. Comparison of the ability of the SCA sector and conservation to predict the functional effect of mutation of PSD95pdz3 residues for various sector sizes.
The vertical axis shows the p-value for a two-sample, two-tailed Mann-Whitney U test comparing the distribution of mutational effects for sector residues vs. conserved residues.
Fig 5
Fig 5. Contingency tables testing whether sector residues or conserved residues are more likely to “touch” functionally-significant LOV2 insertion points for the DHFR protein analyzed in Reynolds et al. [24].
A χ 2 test cannot reject the hypothesis that the two contingency tables are drawn from the same distribution (p = 0.2).
Fig 6
Fig 6. Comparison of the ability of the SCA sector and conserved residues to “touch” the functionally-significant sites of DHFR identified by Reynolds et al. [24].
The vertical axis shows the p-value for a two-tailed χ 2 test comparing the contingency tables obtained for the sector and for conservation (cf. Fig. 5).
Fig 7
Fig 7. Dependence of conservation level on distance from the center of mass of DHFR protein.
Fig 8
Fig 8. Contingency tables testing whether belonging to a sector or being highly conserved is associated with significant functional effect upon mutagenesis for an alignment of voltage-sensing domains of potassium channels.
Experimental data from Li-Smerin et al. [33]. The two contingency tables are identical although less than 80% of the residues are common between the SCA sector and the conserved positions.
Fig 9
Fig 9. Contingency tables testing whether a lac repressor residue belonging to a sector or being highly-conserved is associated with significant functional effect upon mutagenesis.
There is about 67% overlap between the two sets of residues. A two-tailed χ 2 test cannot reject the hypothesis that the two tables are drawn from the same distribution (p = 0.2).
Fig 10
Fig 10. Histograms showing the effect of mutations on repression ability of lacI for all mutations (gray), and for mutations to selected positions (black).
Each of the histograms in black contains 82 positions, with A. the largest SCA scores, or B. the largest conservation levels. While a Mann-Whitney U test finds the difference between the distribution of mutational effects for sector positions and the one for conserved positions bordering on statistical significance (p ≈ 0.08), note that it is conservation that better matches the functional data.
Fig 11
Fig 11. Comparison of the ability of the SCA sector and conservation to predict the functional effect of mutation of lac repressor residues for various sector sizes.
The vertical axis shows the p-value for a two-sample, two-tailed Mann-Whitney U test comparing the distribution of mutational effects for sector residues vs. conserved residues. At large sector sizes, where the p value hovers around 0.1, it is the conserved residues that better match the functional data, rather than the SCA sector residues.

References

    1. Do CB, Katoh K (2008) Protein multiple sequence alignment. Methods in Molecular Biology 484: 379–413. - PubMed
    1. Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3: 131–144. 10.1517/14622416.3.1.131 - DOI - PubMed
    1. Schneider TD, Stormo GD, Gold L, Ehrenfeucht A (1986) Information content of binding sites on nucleotide sequences. Journal of Molecular Biology 188: 415–431. 10.1016/0022-2836(86)90165-8 - DOI - PubMed
    1. Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJE (1987) Prediction of protein secondary structure and active sites using the alignment of homologous sequences. Journal of Molecular Biology 195: 957–961. 10.1016/0022-2836(87)90501-8 - DOI - PubMed
    1. Hollstein M, Sidransky D, Vogelstein B, Harris CC (1991) p53 Mutations in Human Cancers. Science 253: 49–53. 10.1126/science.1905840 - DOI - PubMed

Publication types

MeSH terms