Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Aug:69:63-69.
doi: 10.1016/j.sbi.2021.03.009. Epub 2021 Apr 25.

Data-driven computational protein design

Affiliations
Review

Data-driven computational protein design

Vincent Frappier et al. Curr Opin Struct Biol. 2021 Aug.

Abstract

Computational protein design can generate proteins not found in nature that adopt desired structures and perform novel functions. Although proteins could, in theory, be designed with ab initio methods, practical success has come from using large amounts of data that describe the sequences, structures, and functions of existing proteins and their variants. We present recent creative uses of multiple-sequence alignments, protein structures, and high-throughput functional assays in computational protein design. Approaches range from enhancing structure-based design with experimental data to building regression models to training deep neural nets that generate novel sequences. Looking ahead, deep learning will be increasingly important for maximizing the value of data for protein design.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests

V. Frappier is employed by Generate Biomedicines, a protein design company.

Figures

Figure 1.
Figure 1.
Overview of data-driven design. (A) Design can be based on protein structures, sequences, or experimental properties. (B) Raw data are sometimes processed to give derived constructs such as TERMs, van der Mers, single-residue frequencies, residue-residue covariation strengths, or single-mutant fitness values. (C) Raw or processed data are used to generate different types of models, e.g. models based on statistical analysis, regression, or neural networks, which can describe first-order, second-order, or possibly higher-order contributions from protein residues. (D) Models can be used to generate one or more new protein sequences or a quantitative description of a fitness landscape that can be used for design. Data-based methods can also guide library screening (purple) or be combined with methods that involve all-atom modeling (not shown here).

References

    1. Dahiyat BI, Mayo SL: Protein design automation. Protein Sci 1996, 5:895–903. - PMC - PubMed
    1. Dahiyat BI, Mayo SL: De Novo Protein Design: Fully Automated Sequence Selection. Science (80-) 1997, 278:82–87. - PubMed
    1. Khersonsky O, Lipsh R, Avizemer Z, Ashani Y, Goldsmith M, Leader H, Dym O, Rogotner S, Trudeau DL, Prilusky J, et al.: Automated Design of Efficient and Functionally Diverse Enzyme Repertoires. Mol Cell 2018, 72:178–186.e5. - PMC - PubMed
    1. Glasgow AA, Huang Y, Mandell DJ, Thompson M, Ritterson R, Loshbaugh AL, Pellegrino J, Krivacic C, Pache RA, Barlow KA, et al.: Computational design of a modular protein sense-response system. Science (80-) 2019, 366:1024–1028. - PMC - PubMed
    1. Glasgow A, Glasgow J, Limonta D, Solomon P, Lui I, Zhang Y, Nix MA, Rettko NJ, Zha S, Yamin R, et al.: Engineered ACE2 receptor traps potently neutralize SARS-CoV-2. Proc Natl Acad Sci U S A 2020, 117:28046–28055. - PMC - PubMed

Publication types

LinkOut - more resources