. 2015 Feb 27;11(2):e1004091.

doi: 10.1371/journal.pcbi.1004091. eCollection 2015 Feb.

Protein sectors: statistical coupling analysis versus conservation

Tiberiu Teşileanu¹, Lucy J Colwell², Stanislas Leibler³

Affiliations

¹ The Simons Center for Systems Biology and The School of Natural Sciences, Institute for Advanced Study, Einstein Drive, Princeton, New Jersey, United States of America; Initiative for the Theoretical Sciences, CUNY Graduate Center, 365 Fifth Avenue, New York, New York, United States of America.
² The Simons Center for Systems Biology and The School of Natural Sciences, Institute for Advanced Study, Einstein Drive, Princeton, New Jersey, United States of America; Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom.
³ The Simons Center for Systems Biology and The School of Natural Sciences, Institute for Advanced Study, Einstein Drive, Princeton, New Jersey, United States of America; Center for Studies in Physics and Biology and Laboratory of Living Matter, The Rockefeller University, 1230 York Avenue, New York, New York, United States of America.

PMID: 25723535
PMCID: PMC4344308
DOI: 10.1371/journal.pcbi.1004091

Protein sectors: statistical coupling analysis versus conservation

Tiberiu Teşileanu et al. PLoS Comput Biol. 2015.

. 2015 Feb 27;11(2):e1004091.

doi: 10.1371/journal.pcbi.1004091. eCollection 2015 Feb.

Authors

Tiberiu Teşileanu¹, Lucy J Colwell², Stanislas Leibler³

Affiliations

¹ The Simons Center for Systems Biology and The School of Natural Sciences, Institute for Advanced Study, Einstein Drive, Princeton, New Jersey, United States of America; Initiative for the Theoretical Sciences, CUNY Graduate Center, 365 Fifth Avenue, New York, New York, United States of America.
² The Simons Center for Systems Biology and The School of Natural Sciences, Institute for Advanced Study, Einstein Drive, Princeton, New Jersey, United States of America; Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, United Kingdom.
³ The Simons Center for Systems Biology and The School of Natural Sciences, Institute for Advanced Study, Einstein Drive, Princeton, New Jersey, United States of America; Center for Studies in Physics and Biology and Laboratory of Living Matter, The Rockefeller University, 1230 York Avenue, New York, New York, United States of America.

PMID: 25723535
PMCID: PMC4344308
DOI: 10.1371/journal.pcbi.1004091

Abstract

Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed "sectors". The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Relation between the components of the top eigenvector of the SCA matrix and its diagonal elements, or conservation.**
A. Comparison to the square root of the diagonal elements of the SCA matrix. B. Comparison to conservation. This was obtained for the PDZ alignment, but the results are similar for other alignments.

**Fig 2. Contingency tables testing whether a PSD95^pdz3 residue belonging to a sector or being highly-conserved is associated with significant functional effect upon mutagenesis.**
The tables are identical although only 57% of the residues are shared between the sector and the conserved positions.

**Fig 3. Histograms showing the effect of mutations on binding affinity of PSD95^pdz3 with cognate ligand, for all mutations (gray), and for mutations to selected positions (black).**
Each of the histograms in black contains 21 positions, with A. the largest SCA scores, or B. the largest conservation levels. A Mann-Whitney U test cannot find a statistically-significant difference between the distribution of mutational effects for sector positions and the one for conserved positions (p = 0.9). The mutational effect ${〈 Δ E_{i}^{x} 〉}_{x}$ is a dimensionless quantity calculated as in McLaughlin Jr. et al. [25].

**Fig 4. Comparison of the ability of the SCA sector and conservation to predict the functional effect of mutation of PSD95^pdz3 residues for various sector sizes.**
The vertical axis shows the p-value for a two-sample, two-tailed Mann-Whitney U test comparing the distribution of mutational effects for sector residues vs. conserved residues.

Fig 5. Contingency tables testing whether sector residues or conserved residues are more likely to “touch” functionally-significant LOV2 insertion points for the DHFR protein analyzed in Reynolds et al. [24].
A χ ² test cannot reject the hypothesis that the two contingency tables are drawn from the same distribution (p = 0.2).

**Fig 6. Comparison of the ability of the SCA sector and conserved residues to “touch” the functionally-significant sites of DHFR identified by Reynolds et al. [24].**
The vertical axis shows the p-value for a two-tailed χ ² test comparing the contingency tables obtained for the sector and for conservation (cf. Fig. 5).

**Fig 7. Dependence of conservation level on distance from the center of mass of DHFR protein.**

Fig 8. Contingency tables testing whether belonging to a sector or being highly conserved is associated with significant functional effect upon mutagenesis for an alignment of voltage-sensing domains of potassium channels.
Experimental data from Li-Smerin et al. [33]. The two contingency tables are identical although less than 80% of the residues are common between the SCA sector and the conserved positions.

**Fig 9. Contingency tables testing whether a *lac* repressor residue belonging to a sector or being highly-conserved is associated with significant functional effect upon mutagenesis.**
There is about 67% overlap between the two sets of residues. A two-tailed χ ² test cannot reject the hypothesis that the two tables are drawn from the same distribution (p = 0.2).

**Fig 10. Histograms showing the effect of mutations on repression ability of *lacI* for all mutations (gray), and for mutations to selected positions (black).**
Each of the histograms in black contains 82 positions, with A. the largest SCA scores, or B. the largest conservation levels. While a Mann-Whitney U test finds the difference between the distribution of mutational effects for sector positions and the one for conserved positions bordering on statistical significance (p ≈ 0.08), note that it is conservation that better matches the functional data.

**Fig 11. Comparison of the ability of the SCA sector and conservation to predict the functional effect of mutation of *lac* repressor residues for various sector sizes.**
The vertical axis shows the p-value for a two-sample, two-tailed Mann-Whitney U test comparing the distribution of mutational effects for sector residues vs. conserved residues. At large sector sizes, where the p value hovers around 0.1, it is the conserved residues that better match the functional data, rather than the SCA sector residues.

See this image and copyright information in PMC

References

1. Do CB, Katoh K (2008) Protein multiple sequence alignment. Methods in Molecular Biology 484: 379–413. - PubMed
1. Notredame C (2002) Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 3: 131–144. 10.1517/14622416.3.1.131 - DOI - PubMed
1. Schneider TD, Stormo GD, Gold L, Ehrenfeucht A (1986) Information content of binding sites on nucleotide sequences. Journal of Molecular Biology 188: 415–431. 10.1016/0022-2836(86)90165-8 - DOI - PubMed
1. Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJE (1987) Prediction of protein secondary structure and active sites using the alignment of homologous sequences. Journal of Molecular Biology 195: 957–961. 10.1016/0022-2836(87)90501-8 - DOI - PubMed
1. Hollstein M, Sidransky D, Vogelstein B, Harris CC (1991) p53 Mutations in Human Cancers. Science 253: 49–53. 10.1126/science.1905840 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Protein sectors: statistical coupling analysis versus conservation

Affiliations

Protein sectors: statistical coupling analysis versus conservation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources