De novo protein design by deep network hallucination
- PMID: 34853475
- PMCID: PMC9293396
- DOI: 10.1038/s41586-021-04184-w
De novo protein design by deep network hallucination
Abstract
There has been considerable recent progress in protein structure prediction using deep neural networks to predict inter-residue distances from amino acid sequences1-3. Here we investigate whether the information captured by such networks is sufficiently rich to generate new folded proteins with sequences unrelated to those of the naturally occurring proteins used in training the models. We generate random amino acid sequences, and input them into the trRosetta structure prediction network to predict starting residue-residue distance maps, which, as expected, are quite featureless. We then carry out Monte Carlo sampling in amino acid sequence space, optimizing the contrast (Kullback-Leibler divergence) between the inter-residue distance distributions predicted by the network and background distributions averaged over all proteins. Optimization from different random starting points resulted in novel proteins spanning a wide range of sequences and predicted structures. We obtained synthetic genes encoding 129 of the network-'hallucinated' sequences, and expressed and purified the proteins in Escherichia coli; 27 of the proteins yielded monodisperse species with circular dichroism spectra consistent with the hallucinated structures. We determined the three-dimensional structures of three of the hallucinated proteins, two by X-ray crystallography and one by NMR, and these closely matched the hallucinated models. Thus, deep networks trained to predict native protein structures from their sequences can be inverted to design new proteins, and such networks and methods should contribute alongside traditional physics-based models to the de novo design of proteins with new functions.
© 2021. The Author(s), under exclusive licence to Springer Nature Limited.
Conflict of interest statement
Competing interests
G.T.M is a co-founder of Nexomics Biosciences, Inc.
Figures













Comment in
-
Dreaming ideal protein structures.Nat Biotechnol. 2022 Feb;40(2):171-172. doi: 10.1038/s41587-021-01196-9. Nat Biotechnol. 2022. PMID: 35075248 No abstract available.
-
Scientists are using AI to dream up revolutionary new proteins.Nature. 2022 Sep;609(7928):661-662. doi: 10.1038/d41586-022-02947-7. Nature. 2022. PMID: 36109683 No abstract available.
Similar articles
-
CNNcon: improved protein contact maps prediction using cascaded neural networks.PLoS One. 2013 Apr 23;8(4):e61533. doi: 10.1371/journal.pone.0061533. Print 2013. PLoS One. 2013. PMID: 23626696 Free PMC article.
-
The trRosetta server for fast and accurate protein structure prediction.Nat Protoc. 2021 Dec;16(12):5634-5651. doi: 10.1038/s41596-021-00628-9. Epub 2021 Nov 10. Nat Protoc. 2021. PMID: 34759384 Review.
-
Cyclic peptide structure prediction and design using AlphaFold2.Nat Commun. 2025 May 21;16(1):4730. doi: 10.1038/s41467-025-59940-7. Nat Commun. 2025. PMID: 40399308 Free PMC article.
-
SeqPredNN: a neural network that generates protein sequences that fold into specified tertiary structures.BMC Bioinformatics. 2023 Oct 3;24(1):373. doi: 10.1186/s12859-023-05498-4. BMC Bioinformatics. 2023. PMID: 37789284 Free PMC article.
-
Protein sequence design by conformational landscape optimization.Proc Natl Acad Sci U S A. 2021 Mar 16;118(11):e2017228118. doi: 10.1073/pnas.2017228118. Proc Natl Acad Sci U S A. 2021. PMID: 33712545 Free PMC article.
Cited by
-
SPIN-CGNN: Improved fixed backbone protein design with contact map-based graph construction and contact graph neural network.PLoS Comput Biol. 2023 Dec 7;19(12):e1011330. doi: 10.1371/journal.pcbi.1011330. eCollection 2023 Dec. PLoS Comput Biol. 2023. PMID: 38060617 Free PMC article.
-
Accelerating the design of pili-enabled living materials using an integrative technological workflow.Nat Chem Biol. 2024 Feb;20(2):201-210. doi: 10.1038/s41589-023-01489-x. Epub 2023 Nov 27. Nat Chem Biol. 2024. PMID: 38012344
-
Scientists are using AI to dream up revolutionary new proteins.Nature. 2022 Sep;609(7928):661-662. doi: 10.1038/d41586-022-02947-7. Nature. 2022. PMID: 36109683 No abstract available.
-
Incremental Inverse Design of Desired Soybean Phenotypes.ACS Omega. 2024 Sep 30;9(40):41208-41216. doi: 10.1021/acsomega.4c01704. eCollection 2024 Oct 8. ACS Omega. 2024. PMID: 39398153 Free PMC article.
-
Machine learning for functional protein design.Nat Biotechnol. 2024 Feb;42(2):216-228. doi: 10.1038/s41587-024-02127-0. Epub 2024 Feb 15. Nat Biotechnol. 2024. PMID: 38361074 Review.
References
-
- Senior AW et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020). - PubMed
-
- Biswas S, Khimulya G, Alley EC, Esvelt KM & Church GM Low-N protein engineering with data-efficient deep learning. Nat. Methods 18, 389–396 (2021). - PubMed
-
- Madani A et al. ProGen: Language Modeling for Protein Generation. bioRxiv (2020) doi:10.1101/2020.03.07.982272. - DOI
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources