Engineering Proteins Using Statistical Models of Coevolutionary Sequence Information
- PMID: 38110247
- PMCID: PMC10982702
- DOI: 10.1101/cshperspect.a041463
Engineering Proteins Using Statistical Models of Coevolutionary Sequence Information
Abstract
Homologous protein sequences are wonderfully diverse, indicating many possible evolutionary "solutions" to the encoding of function. Consequently, one can construct statistical models of protein sequence by analyzing amino acid frequency across a large multiple sequence alignment. A central premise is that covariance between amino acid positions reflects coevolution due to a shared functional or biophysical constraint. In this review, we describe the implementation and discuss the advantages, limitations, and recent progress on two coevolution-based modeling approaches: (1) Potts models of protein sequence (direct coupling analysis [DCA]-like), and (2) the statistical coupling analysis (SCA). Each approach detects interesting features of protein sequence and structure-the former emphasizes local physical contacts throughout the structure, while the latter identifies larger evolutionarily coupled networks of residues. Recent advances in large-scale gene synthesis and high-throughput functional selection now motivate additional work to benchmark model performance across quantitative function prediction and de novo design tasks.
Copyright © 2024 Cold Spring Harbor Laboratory Press; all rights reserved.
Similar articles
-
Sequence-Based Protein Design: A Review of Using Statistical Models to Characterize Coevolutionary Traits for Developing Hybrid Proteins as Genetic Sensors.Int J Mol Sci. 2024 Jul 30;25(15):8320. doi: 10.3390/ijms25158320. Int J Mol Sci. 2024. PMID: 39125888 Free PMC article. Review.
-
Direct coevolutionary couplings reflect biophysical residue interactions in proteins.J Chem Phys. 2016 Nov 7;145(17):174102. doi: 10.1063/1.4966156. J Chem Phys. 2016. PMID: 27825220
-
Coevolution-based inference of amino acid interactions underlying protein function.Elife. 2018 Jul 20;7:e34300. doi: 10.7554/eLife.34300. Elife. 2018. PMID: 30024376 Free PMC article.
-
Constructing sequence-dependent protein models using coevolutionary information.Protein Sci. 2016 Jan;25(1):111-22. doi: 10.1002/pro.2758. Epub 2015 Aug 10. Protein Sci. 2016. PMID: 26223372 Free PMC article.
-
Correlated positions in protein evolution and engineering.J Ind Microbiol Biotechnol. 2017 May;44(4-5):687-695. doi: 10.1007/s10295-016-1811-1. Epub 2016 Aug 11. J Ind Microbiol Biotechnol. 2017. PMID: 27514664 Review.
Cited by
-
Using AlphaFold2 to Predict the Conformations of Side Chains in Folded Proteins.bioRxiv [Preprint]. 2025 Feb 14:2025.02.10.637534. doi: 10.1101/2025.02.10.637534. bioRxiv. 2025. PMID: 39990457 Free PMC article. Preprint.
-
Sequence-Based Protein Design: A Review of Using Statistical Models to Characterize Coevolutionary Traits for Developing Hybrid Proteins as Genetic Sensors.Int J Mol Sci. 2024 Jul 30;25(15):8320. doi: 10.3390/ijms25158320. Int J Mol Sci. 2024. PMID: 39125888 Free PMC article. Review.
-
Protein stability is determined by single-site bias rather than pairwise covariance.bioRxiv [Preprint]. 2025 Jan 14:2025.01.09.632118. doi: 10.1101/2025.01.09.632118. bioRxiv. 2025. PMID: 39868188 Free PMC article. Preprint.
-
Considering Metabolic Context in Enzyme Evolution and Design.Biochemistry. 2025 Aug 19;64(16):3495-3507. doi: 10.1021/acs.biochem.5c00165. Epub 2025 Aug 5. Biochemistry. 2025. PMID: 40763921 Review.
References
-
- Anfinsen CB, Scheraga HA. 1975. Experimental and theoretical aspects of protein folding. In Advances in protein chemistry (ed. Anfinsen CB, et al.), Vol. 29, pp. 205–300. Academic Press, New York. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources