Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 28;369(6507):1132-1136.
doi: 10.1126/science.abc0881.

Expanding the space of protein geometries by computational design of de novo fold families

Affiliations

Expanding the space of protein geometries by computational design of de novo fold families

Xingjie Pan et al. Science. .

Abstract

Naturally occurring proteins vary the precise geometries of structural elements to create distinct shapes optimal for function. We present a computational design method, loop-helix-loop unit combinatorial sampling (LUCS), that mimics nature's ability to create families of proteins with the same overall fold but precisely tunable geometries. Through near-exhaustive sampling of loop-helix-loop elements, LUCS generates highly diverse geometries encompassing those found in nature but also surpassing known structure space. Biophysical characterization showed that 17 (38%) of 45 tested LUCS designs encompassing two different structural topologies were well folded, including 16 with designed non-native geometries. Four experimentally solved structures closely matched the designs. LUCS greatly expands the designable structure space and offers a new paradigm for designing proteins with tunable geometries that may be customizable for novel functions.

PubMed Disclaimer

Conflict of interest statement

Competing interests:

The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. LUCS sampling strategy to create de novo designed protein fold families with tunable geometries.
A. In nature, protein fold topologies (left) are diversified to create families of proteins with distinct geometries (right) optimized for function. Alpha-helices are shown as cylinders and beta-strands as arrows. The box shows schematic representations of common types of geometric variation. B. The LUCS computational design protocol seeks to mimic the ability of evolution to diversity protein geometries to generate de novo designed fold families. C. Schematic of the LUCS protocol for sampling LHL geometries. The reshaped LHL units are colored in red and blue. Typical numbers of models generated at major stages of the protocol are indicated. D. Designed fold families. Schematic shows fold topologies and design problems (Rossman fold with 1 or 2 reshaped LHL units, and NTF2 fold with 2 reshaped LHL units). Also shown are numbers for geometries generated by LUCS, designed models that passed quality filters, and experimentally characterized designs for three design problems. % folded indicates the fraction of experimentally tested designs that adopted folded structures.
Figure 2.
Figure 2.. Close agreement between models and experimentally determined structures of designed proteins.
AC, designs for the Rossmann fold topology and D–F, design for the NTF2 fold topology. Experimentally determined structures are shown in yellow and design models in grey with the reshaped LHL elements highlighted in red and blue. AC. Comparison between computational models and NMR structures for designs RO2_1(A), RO2_20(B) and RO2_25(C). Also shown are the backbone heavy atom RMSDs calculated using the lowest energy structure from the NMR ensemble. D. The binding pocket of a phosphatidylethanolamine ligand. The 2Fo − Fc electron density map (cyan) for the ligand molecule is shown at 1.0 σ level. E. Comparison between computational model and X-ray crystal structure for the design NT_9. The phosphatidylethanolamine ligand is shown in space fill representation (carbon atoms in yellow, oxygen atoms in red, phosphorus atoms in orange, and nitrogen atoms in blue). Also shown are the backbone heavy atom RMSDs calculated including or excluding the terminal helices, respectively. F. Alignment between the designed helices in the computational model and the experimentally solved structure for design NT-9. The hydrophobic residues at the packing interface are shown in stick representation. The RMSD shown includes the helix backbone heavy atoms and side chain heavy atoms displayed as sticks.
Figure 3.
Figure 3.. Geometry space sampled by de novo designed fold families.
In A and B, the columns show the 3 design problems: Left, Rossman fold with one designed LHL unit (RO1); middle, Rossmann fold with two designed LHL units (RO2); right: NTF2 fold with two designed LHL units (NT). A. Heatmaps showing backbone RMSDs between the reshaped LHL-regions of well-folded designs, comparing design models (x axis) with experimentally determined structures (_exp) or lowest-scoring models from Rosetta structure prediction (y axis). Green boxes show RMSDs calculated using experimentally solved structures. Red boxes (right columns) show the RMSDs between designs and the closest known structures found by TM-align. B. Projection of centers and directions of designed helices (arrows) onto the underlying beta sheets. For the RO2 (middle) and NT (right) columns, panels show distributions in designable models (Fig. 1D) on the left (helices colored red and blue), and in known naturally occurring structures on the right (corresponding helices in orange and cyan). The two rows show helices on two z-level planes based on their distances from the beta-sheet projection plane. For planes that have more than 1000 sampled structures, only 1000 randomly selected helices are shown. For the designs, experimentally confirmed folded designs are represented as bold arrows with yellow boundaries and designs with experimentally solved structures as bold arrows with green boundaries. For the natural proteins, the Rossmann fold structures are from the CATH superfamily 3.40.50.1980 and the NTF2 fold structures are from the CATH superfamily 3.10.450.50. C. Number of structure bins occupied by known structures (orange, cyan) and sampled by designable models generated by LUCS (red, blue). D. Structure bins occupied by well folded designs. E. Classification of the well folded structures by the number of novel structure bins they occupy.
Figure 4.
Figure 4.. Structural features encoding distinct protein geometries.
A. Sequence patterns of the hydrophobic cores in three designed models for the Rossman fold, aligned by corresponding secondary structure elements (top). Hydrophobic residues are shown as letters in rainbow colors ordered by position in the primary protein sequence and scaled by side chain size. Grey underlines indicate positions of surface exposed polar residues. The residues in the boxes are the knob residues shown in (C). B. Atomic packing of hydrophobic cores in the three experimentally determined structures for the Rossman fold (Fig. 2). The hydrophobic side chains in the designed cores are shown as spheres. C. Knob-socket packing motifs found in the designs. Three residues on a helix (grey sticks and surfaces) form a socket accommodating a knob residue shown as colored spheres. D. Examples of tertiary motifs matching the designed LHL structures. The designed structures are shown in grey and the matched motifs are shown in magenta. Sidechains of the best matched tertiary motifs and design models are shown as sticks. Insets indicate location of the tertiary motif in the structure in the same orientation as in B.

Similar articles

Cited by

References

    1. Baker D An exciting but challenging road ahead for computational enzyme design. Protein Sci 19, 1817–9 (2010). - PMC - PubMed
    1. Kundert K & Kortemme T Computational design of structured loops for new protein functions. Biol Chem 400, 275–288 (2019). - PMC - PubMed
    1. Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, Orengo CA & Sillitoe I CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res 45, D289–D295 (2017). - PMC - PubMed
    1. Fox NK, Brenner SE & Chandonia JM SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42, D304–9 (2014). - PMC - PubMed
    1. Hou J, Jun SR, Zhang C & Kim SH Global mapping of the protein structure space and application in structure-based inference of protein function. Proc Natl Acad Sci U S A 102, 3651–6 (2005). - PMC - PubMed

Publication types

MeSH terms