Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 24;528(7583):580-4.
doi: 10.1038/nature16162. Epub 2015 Dec 16.

Exploring the repeat protein universe through computational protein design

Affiliations

Exploring the repeat protein universe through computational protein design

T J Brunette et al. Nature. .

Abstract

A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain. Repeat proteins composed of multiple tandem copies of a modular structure unit are widespread in nature and have critical roles in molecular recognition, signalling, and other essential biological processes. Naturally occurring repeat proteins have been re-engineered for molecular recognition and modular scaffolding applications. Here we use computational protein design to investigate the space of folded structures that can be generated by tandem repeating a simple helix-loop-helix-loop structural motif. Eighty-three designs with sequences unrelated to known repeat proteins were experimentally characterized. Of these, 53 are monomeric and stable at 95 °C, and 43 have solution X-ray scattering spectra consistent with the design models. Crystal structures of 15 designs spanning a broad range of curvatures are in close agreement with the design models with root mean square deviations ranging from 0.7 to 2.5 Å. Our results show that existing repeat proteins occupy only a small fraction of the possible repeat protein sequence and structure space and that it is possible to design novel repeat proteins with precisely specified geometries, opening up a wide array of new possibilities for biomolecular engineering.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Extended Data Figure 1
Extended Data Figure 1. Computational protocol for designing de novo repeat proteins
a, flowchart of the design protocol. The green box indicates user-controlled inputs, the grey boxes represent steps where protein structure is created or modified, and the white boxes indicate where structures are filtered. b, low resolution backbone build. c, quick full-atom design (grey) improves the backbone model (red). The superposition in the middle highlights the structural changes introduced. d, structural profile: a 9-residue fragment is matched against the PDB repository for structures within 0.5 Å RMSD. The sequences from these structures are used to generate a sequence profile that influences design. e, packing filters were used to discard designs with cavities in the core, illustrated as grey spheres.
Extended Data Figure 2
Extended Data Figure 2. Repeat space explored and model discrimination across design stages
Percentage of models accepted at backbone building or centroid (a), design (b) and ab initio (c) stages. Models are divided according to secondary structure length. The combination of loop1 and loop2 lengths is indicated on top. X and Y axis indicate helix1 and helix2 lengths, respectively. The fraction of models in the bin that passed the selection stage is indicated in the side bar. Generally, one residue loops and large differences between helix lengths reduce the number of selected models. d, distribution of radius and twist of models in the three stages. e, number of models passing design stages (log scale). From ~2.8 million structures, 761 are accepted.
Extended Data Figure 3
Extended Data Figure 3. Model validation by in silico folding
To assess folding robustness seven sequence variants were made for each design. a-g illustrate the energy landscape explored by Rosetta ab-initio. In red are the protein models produced by ab initio search, in green by side chain repacking and minimization (relax). Models in deep global energy minima near the relaxed structures are considered folded. The variant with highest density of ab initio models near the relax region was chosen for experimental characterization (blue box). h, Jalview sequence alignment of the first 100 residues of the variants. The yellow bar height indicates sequence conservation, while the black bar how often the consensus sequence occurs.
Extended Data Figure 4
Extended Data Figure 4. Distribution of DHR axial displacement (z) and twist (ω)
Parameters for repeat protein family representatives were extracted as described in the Supplementary Information. The DHR-models are the 761 proteins validated by in silico folding.
Extended Data Figure 5
Extended Data Figure 5. Superposition between single internal repeats (second repeat) of designs (grey) and crystal structures (yellow)
Aliphatic and aromatic side chains are in red and cysteines are in orange. DHR7 and 18 show intra repeat disulphide bonds while DHR4 and 81 form inter-repeat cystines. DHR5 does not form the expected S-S bond. Core side chains in design recapitulate the conformation observed in the crystal structures. Even when the backbone is shifted (e.g. DHR5, 8, 15), rotamers are by large correctly predicted.
Extended Data Figure 6
Extended Data Figure 6. Structural validation by SAXS
a, Vr values for the fit of SAXS profiles to design models, in dark grey, and crystal structures, in yellow. For 43 designs, models are within the range defined by crystal structures. DHR49 and DHR76 form dimers in solution and the models employed the configuration observed in the crystal structures. Designs showing aggregation on the scattering profiles, including DHR5 for which the structure was solved, were not included in this figure. b and c, pairwise Vr similarity maps of 43 design models. b, experimental to model profile similarity, and c, model to model profile similarity. Models that are similar to each other show correlation off-diagonal in c, and the same pattern is observed when compared to experimental data in b. The order of display was obtained by clustering the original designed models by structural similarity. The ability to reproduce characteristic patterns within a large set of designs indicates that the models are capturing the relative structural similarities between proteins in solution. The scores are color coded with red indicating best agreement and white lack of agreement.
Extended Data Figure 7
Extended Data Figure 7. Designs are stable to chemical denaturation by guanidine HCl (GuHCl)
Circular dichroism monitored GuHCl denaturant experiments were carried for two designs for which crystal structures were solved (DHR4 and DHR14), two with overall shapes confirmed by SAXS (DHR21 and DHR62), and two with overall shapes inconsistent with SAXS (DHR17 and DHR67). In contrast to almost all native proteins, four of the six proteins do not denature at GuHCl concentrations up to 7.5 M. Both designs not confirmed by SAXS were extremely stable to GuHCl denaturation and hence are very well folded proteins; the discrepancies between the computed and experimental SAXS profiles may be due to small amounts of oligomeric species or variation in overall twist.
Extended Data Figure 8
Extended Data Figure 8. Structural similarity between DHRs and repeat protein families
DHRs cluster separately from existing repeat proteins. DHRs are equally distributed between right-handed and left-handed repeats, as referred to the repeat handedness, in contrast to known alpha helical repeat proteins, which are mostly right-handed. This result indicates that the handedness observed in known families is not an intrinsic limitation of repeat proteins structures. Repeat handedness, as defined by Kobe and Kajava, indicates the rotation of the main chain going from the N- to the C-terminal around the axis connecting the repeat centers of mass. The structural similarity tree was built using pairwise comparison as measured by TM-score.
Extended Data Figure 9
Extended Data Figure 9. Extended versions of models validated by SAXS and crystallography
DHRs were characterized as containing four repeats but the number of internal repeats can be increased without additional design steps. Extended models highlight the differences in twist and radius between the validated designs.
Figure 1
Figure 1. Schematic overview of the computational design method
The lengths of each helix and loop were systematically enumerated. For each choice of helix and loop lengths, individual repeat units (red boxes on right) were built up from fragments of proteins of known structure, and then propagated to generate extended repeating structures (gray) with right-handed or left-handed twist.
Figure 2
Figure 2. The helical repeat protein universe
a, the geometry of a repeat protein can be described by axial displacement (z), radius of the helix (r) and angular displacement or twist (ω) between repeat units. b, designed helical repeat (DHR) proteins (grey) cover radius and twist ranges not found in native repeat protein families (colors). Designs forming right-handed superhelices have positive ω values; left-handed, negative ω values. Native families abbreviations: ANK, ankyrin; ARM, armadillo; TPR, tetratricopeptide repeat; HAT, half TPR; PPR, pentatricopeptide repeat; HEAT, heat repeat; PUM, pumilio homology domain; mTERF, mitochondrial termination factor; TAL, transcription activator-like effector; OTHER, alpha helical repeat proteins not in the other families. Designs structurally validated by small angle x-ray scattering (SAXS) (black) or crystallography (black with red circle) are distributed throughout the space. On top, representative experimentally validated designs with a variety of shapes.
Figure 3
Figure 3. Characterization of designed repeat proteins
a, Design success rate. Values for subset with disulfide bonds are in parentheses. b, results on six representative designs. Top row: design models. Second row: computed energy landscapes. Energy is on y axis (REU, Rosetta energy unit) and RMSD from design model on x axis. All six landscapes are strongly funneled into the designed energy minimum. Third row: CD spectra collected at 25°C (red), 95°C (blue) and back to 25°C (black). The proteins do not denature within this temperature range (MRE, mean residue elipticity; deg•cm2•dmol−1•residue−1). Bottom row: SEC elution profile directly after affinity chromatography purification. The designs are mostly monodisperse. The maximum absorbance at 280 nm was normalized to 1.
Figure 4
Figure 4. Crystal structures of fifteen designs are in close agreement with the design models
Crystal structures are in yellow, and the design models in grey. Insets in circles show the overall shape of the repeat protein. The RMSD values across all backbone heavy atoms are: 1.50 Å (DHR4), 1.73 Å (DHR5), 1.30 Å (DHR7), 2.28 Å (DHR8), 1.79 Å (DHR10), 2.38 Å (DHR14), 1.21 Å (DHR18), 0.87 Å (DHR49), 1.33 Å (DHR53), 0.93 Å (DHR54), 1.54 Å (DHR64), 0.67 Å (DHR71), 1.73 Å (DHR76), 1.04 Å (DHR79), 0.65 Å (DHR81). Hydrophobic side chains in the crystal structures (in red) are largely captured by the designs (Extended Data Fig. 5).

References

    1. Kajava AV. Tandem repeats in proteins: From sequence to structure. J Struct Biol. 2012;179:279–288. - PubMed
    1. Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D. A census of protein repeats1. J Mol Biol. 1999;293:151–160. - PubMed
    1. Binz HK, et al. High-affinity binders selected from designed ankyrin repeat protein libraries. Nat Biotechnol. 2004;22:575–582. - PubMed
    1. Varadamsetty G, Tremmel D, Hansen S, Parmeggiani F, Plückthun A. Designed Armadillo Repeat Proteins: Library Generation, Characterization and Selection of Peptide Binders with High Specificity. J Mol Biol. 2012;424:68–87. - PubMed
    1. Cortajarena AL, Liu TY, Hochstrasser M, Regan L. Designed Proteins To Modulate Cellular Networks. ACS Chem Biol. 2010;5:545–552. - PMC - PubMed

Publication types