Disentangling direct from indirect co-evolution of residues in protein alignments
- PMID: 20052271
- PMCID: PMC2793430
- DOI: 10.1371/journal.pcbi.1000633
Disentangling direct from indirect co-evolution of residues in protein alignments
Abstract
Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., statistical dependency, between columns in multiple alignments of protein domain sequences remains one of the most promising avenues for predicting residues that are contacting in the structure. A key impediment to this approach is that strong statistical dependencies are also observed for many residue pairs that are distal in the structure. Using a comprehensive analysis of protein domains with available three-dimensional structures we show that co-evolving contacts very commonly form chains that percolate through the protein structure, inducing indirect statistical dependencies between many distal pairs of residues. We characterize the distributions of length and spatial distance traveled by these co-evolving contact chains and show that they explain a large fraction of observed statistical dependencies between structurally distal pairs. We adapt a recently developed Bayesian network model into a rigorous procedure for disentangling direct from indirect statistical dependencies, and we demonstrate that this method not only successfully accomplishes this task, but also allows contacts with weak statistical dependency to be detected. To illustrate how additional information can be incorporated into our method, we incorporate a phylogenetic correction, and we develop an informative prior that takes into account that the probability for a pair of residues to contact depends strongly on their primary-sequence distance and the amount of conservation that the corresponding columns in the multiple alignment exhibit. We show that our model including these extensions dramatically improves the accuracy of contact prediction from multiple sequence alignments.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures





































































Similar articles
-
Accuracy of structure-based sequence alignment of automatic methods.BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355. BMC Bioinformatics. 2007. PMID: 17883866 Free PMC article.
-
H2r: identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments.BMC Bioinformatics. 2008 Mar 18;9:151. doi: 10.1186/1471-2105-9-151. BMC Bioinformatics. 2008. PMID: 18366663 Free PMC article.
-
Structure-dependent sequence alignment for remotely related proteins.Bioinformatics. 2002 Dec;18(12):1658-65. doi: 10.1093/bioinformatics/18.12.1658. Bioinformatics. 2002. PMID: 12490451
-
Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words.PLoS One. 2011;6(12):e27872. doi: 10.1371/journal.pone.0027872. Epub 2011 Dec 2. PLoS One. 2011. PMID: 22163274 Free PMC article.
-
PROMALS web server for accurate multiple protein sequence alignments.Nucleic Acids Res. 2007 Jul;35(Web Server issue):W649-52. doi: 10.1093/nar/gkm227. Epub 2007 Apr 22. Nucleic Acids Res. 2007. PMID: 17452345 Free PMC article.
Cited by
-
Structural constraints on the covariance matrix derived from multiple aligned protein sequences.PLoS One. 2011;6(12):e28265. doi: 10.1371/journal.pone.0028265. Epub 2011 Dec 5. PLoS One. 2011. PMID: 22194819 Free PMC article.
-
From residue coevolution to protein conformational ensembles and functional dynamics.Proc Natl Acad Sci U S A. 2015 Nov 3;112(44):13567-72. doi: 10.1073/pnas.1508584112. Epub 2015 Oct 20. Proc Natl Acad Sci U S A. 2015. PMID: 26487681 Free PMC article.
-
Improving protein-protein interaction prediction using evolutionary information from low-quality MSAs.PLoS One. 2017 Feb 6;12(2):e0169356. doi: 10.1371/journal.pone.0169356. eCollection 2017. PLoS One. 2017. PMID: 28166227 Free PMC article.
-
Episodic evolution of coadapted sets of amino acid sites in mitochondrial proteins.PLoS Genet. 2021 Jan 25;17(1):e1008711. doi: 10.1371/journal.pgen.1008711. eCollection 2021 Jan. PLoS Genet. 2021. PMID: 33493156 Free PMC article.
-
Integrated analysis of residue coevolution and protein structure in ABC transporters.PLoS One. 2012;7(5):e36546. doi: 10.1371/journal.pone.0036546. Epub 2012 May 8. PLoS One. 2012. PMID: 22590562 Free PMC article.
References
-
- Eddy S. Profile hidden markov models. Bioinformatics. 1998;14:755–763. - PubMed
-
- Lindgreen S, Gardner P, Krogh A. Measuring covariation in RNA alignments: physical realism improves information measures. Bioinformatics. 2006;22(24):2988–2995. - PubMed
-
- Yanovsky C, Horn V, Thorpe D. Protein structure relationships revealed by mutational analysis. Science. 1964;146:1593–1594. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources