. 2023 Aug;30(8):1132-1140.

doi: 10.1038/s41594-023-01029-0. Epub 2023 Jul 3.

Exploration of novel αβ-protein folds through de novo design

Shintaro Minami^#¹, Naohiro Kobayashi^#^{2

3}, Toshihiko Sugiki², Toshio Nagashima³, Toshimichi Fujiwara², Rie Tatsumi-Koga¹, George Chikenji⁴, Nobuyasu Koga^{5

6

7

8}

Affiliations

¹ Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan.
² Institute for Protein Research (IPR), Osaka University, Osaka, Japan.
³ RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan.
⁴ Department of Applied Physics, Graduate School of Engineering, Nagoya University, Nagoya, Japan.
⁵ Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan. nkoga@protein.osaka-u.ac.jp.
⁶ SOKENDAI, The Graduate University for Advanced Studies, Hayama, Japan. nkoga@protein.osaka-u.ac.jp.
⁷ Research Center of Integrative Molecular Systems, Institute for Molecular Science (IMS), National Institutes of Natural Sciences (NINS), Okazaki, Japan. nkoga@protein.osaka-u.ac.jp.
⁸ Laboratory for Protein Design, Institute for Protein Research (IPR), Osaka University, Osaka, Japan. nkoga@protein.osaka-u.ac.jp.

^# Contributed equally.

PMID: 37400653
PMCID: PMC10442233
DOI: 10.1038/s41594-023-01029-0

Exploration of novel αβ-protein folds through de novo design

Shintaro Minami et al. Nat Struct Mol Biol. 2023 Aug.

. 2023 Aug;30(8):1132-1140.

doi: 10.1038/s41594-023-01029-0. Epub 2023 Jul 3.

Authors

Shintaro Minami^#¹, Naohiro Kobayashi^#^{2

3}, Toshihiko Sugiki², Toshio Nagashima³, Toshimichi Fujiwara², Rie Tatsumi-Koga¹, George Chikenji⁴, Nobuyasu Koga^{5

6

7

8}

Affiliations

¹ Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan.
² Institute for Protein Research (IPR), Osaka University, Osaka, Japan.
³ RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan.
⁴ Department of Applied Physics, Graduate School of Engineering, Nagoya University, Nagoya, Japan.
⁵ Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan. nkoga@protein.osaka-u.ac.jp.
⁶ SOKENDAI, The Graduate University for Advanced Studies, Hayama, Japan. nkoga@protein.osaka-u.ac.jp.
⁷ Research Center of Integrative Molecular Systems, Institute for Molecular Science (IMS), National Institutes of Natural Sciences (NINS), Okazaki, Japan. nkoga@protein.osaka-u.ac.jp.
⁸ Laboratory for Protein Design, Institute for Protein Research (IPR), Osaka University, Osaka, Japan. nkoga@protein.osaka-u.ac.jp.

^# Contributed equally.

PMID: 37400653
PMCID: PMC10442233
DOI: 10.1038/s41594-023-01029-0

Abstract

A fundamental question in protein evolution is whether nature has exhaustively sampled nearly all possible protein folds throughout evolution, or whether a large fraction of the possible folds remains unexplored. To address this question, we defined a set of rules for β-sheet topology to predict novel αβ-folds and carried out a systematic de novo protein design exploration of the novel αβ-folds predicted by the rules. The designs for all eight of the predicted novel αβ-folds with a four-stranded β-sheet, including a knot-forming one, folded into structures close to the design models. Further, the rules predicted more than 10,000 novel αβ-folds with five- to eight-stranded β-sheets; this number far exceeds the number of αβ-folds observed in nature so far. This result suggests that a vast number of αβ-folds are possible, but have not emerged or have become extinct due to evolutionary bias.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Observed and unobserved β-sheet topologies in nature.**
a, αβ-Folds defined on the basis of the β-sheet topology and Richardson’s right-handed strand connections shown in b. The upper panel shows a β-sheet topology frequently observed in nature and its corresponding ferredoxin-like fold, the lower panel shows a β-sheet topology unobserved in nature and its corresponding fold. Each β-strand is numbered according to its order along the linear chain. Gray-colored β-strand connections are on the front side of the β-sheet, black-colored ones are on the back side. b, Richardson’s rule on the connection handedness of para-β-X-β motifs. The right-handed strand connection (dark gray bar) rather than the left-handed one (light gray bar) is predominantly observed in naturally occurring proteins. c, Numbers of observed, unobserved and theoretically possible β-sheet topologies for each number of constituent β-strands in a β-sheet (see Fig. 3 and Methods for the definition of observed and unobserved topologies). Source data

**Fig. 2. Rules for β-sheet topology.**
a, Connection jump-distance rule. The jump distance is the number of intervening β-strands between the two β-strands of β-X-β motifs. Para-β-X-β motifs with jump distances of three or less and anti-β-X-β motifs with jump distances of one or less are frequently observed compared to β-X-β motifs with larger jump distances. The same preferences have been previously reported. We revisited them using the current PDB data. b, Connection overlap rule. D-type β-sheet topologies (loops are located on different sides) are more frequently observed than S-type topologies (loops are located on the same side). Blue- and red-colored motifs indicate two different β-X-β motifs. Similar rules have been reported for para-para-β-X-β motifs^,. For anti-anti-β-X-β motifs, a rule termed ‘pretzels’ has been reported^,, but this rule prohibits both S- and D-types. c, Connection ending rule. S- and D- types of β-sheet topologies for pairs of para-β-X-β motifs, in which the second strands of the two motifs are adjacent and parallel-aligned, are shown. S-type β-sheet topologies are more frequently observed than D-type topologies. Source data

**Fig. 3. Distributions of frustration-free and frustrated β-sheet topologies in nature.**
a, Numbers of frustration-free and frustrated β-sheet topologies in each observed or unobserved topology in nature for each number of constituent β-strands in a β-sheet. ^aThe number within each bracket indicates the percentage of unobserved topologies in frustration-free topologies. b, Observation frequencies of all possible 96 topologies for four-stranded β-sheets sorted by frequency. The observation frequency of a topology in nature is represented by the number of homologous groups (superfamily) having the topology (see Methods for details). We regarded topologies with an observation frequency of less than 1/4, at which the slope changes substantially, as unobserved. c, Ratios of frustration-free and frustrated β-sheet topologies depending on the observation frequency for each number of constituent β-strands in a β-sheet. The number in each band indicates the number of each topology. The observation frequency is presented as the logarithm to base 4. d, Distributions of frustration-free and frustrated topologies in nature for all possible 96 topologies of four-stranded β-sheets. β-Strand order indicates in which order the β-strands, numbered along the sequence, are aligned in a β-sheet from left to right; β-strand orientation indicates orientations of the β-strands. In each grid cell, a β-sheet topology is illustrated with its observation frequency in nature indicated by the number below the topology and the background color gradient from white (low frequency) to yellow (high frequency). Frustration-free and frustrated topologies are represented in dark gray and light gray, respectively. β-Sheet topologies corresponding to the Greek key and its circular permutations are marked with an asterisk. Red-colored loops represent topologies including at least one frustration. Topologies enclosed in a bold black square and numbered from one to eight are unobserved frustration-free β-sheet topologies. Source data

**Fig. 4. Characterization of the designs for all eight novel αβ-folds.**
a, Identified novel β-sheet topologies. b, Backbone blueprints used for de novo design of the novel αβ-fold structures. Strand lengths are represented by filled and empty boxes that represent pleats coming out and going into the page, respectively. Letter strings next to the loops indicate their ABEGO torsion patterns. c, Backbone structures generated from the blueprints. Each residue color represents its ABEGO torsion angle (red, A; blue, B; green, G). d, Energy landscapes obtained from Rosetta ab initio structure prediction simulations. Each dot represents the lowest energy structure obtained in an independent trajectory starting from an extended chain (black) or the design model (red) for each sequence; the x axis shows the Cα r.m.s.d. from the design model and the y axis shows the Rosetta all-atom energy. e, Far-ultraviolet CD spectra at various temperatures (30–170 °C). f, Thermal denaturation monitored at 222 nm. g, Two-dimensional ¹H-¹⁵N HSQC spectra at 25 °C and 600 MHz.

**Fig. 5. Comparison of computational models with experimentally determined structures.**
Top, the top two rows show designed novel αβ-folds from NF1 to NF8. The tertiary arrangement of α-helices (circles) and β-strands (triangles) and their connections are shown at the top, the β-sheet topologies below. Middle, computational design models. Bottom, the NMR structures. The r.m.s.d. between the design model and NMR structure for backbone heavy atoms is indicated. The design models are available in Supplementary Data 1, the NMR structures are available in the PDB: NF1-14 (PDB 7BPL), NF2-02 (7BPM), NF3-03 (7BQE), NF4-04 (7BQC), NF5-03 (7BPP), NF6-02 (7BQB), NF7-04 (7BPN) and NF8-01 (7BQD).

**Extended Data Fig. 1. Bending orientation preference of anti-β-X-β motifs with a connection jump-distance number of one.**
a, Left: the bending angle, α, for anti-β₁-X-β₂ motifs, identified as the angle between the β-sheet normal vector v_p and the vector from the midpoint O of the terminal β-strand backbone atoms, C1 (carbonyl carbon of the first strand) and N2 (amide nitrogen of the second strand), to the average coordinate A over the loop Cα atoms. Right: v_p calculated by averaging the normal vectors to the two planes defined by the N1-C1-N2 and C1-N2-C2 backbone atoms, respectively. b, Distribution of the angle α for naturally occurring protein structures with jump-distance number of one. Anti-β-X-β motifs with a bending angle < 90° are more frequently observed than those with > 90°, indicating the right-handed bending orientation preference of anti-β1-X-β2 motifs with a jump-distance number of one. This preference may arise from the intrinsic chirality and geometrical preferences of the polypeptide chain. c, Distributions of the angle α for naturally occurring protein structures with a jump-distance number of 0 (top), 2 (middle), and ≥3 (bottom), respectively. No bending angle preferences were observed. Source data

**Extended Data Fig. 2. Register shift rule for para-β-X-β motifs.**
Register shifts for para-β-X-β motifs were defined in the relations of the second strand (red) in the β-X-β motif with the adjacent parallelly aligned β-strands (gray): an inner register shift is when the gray β-strand is inside the para-β-X-β motif, and an outer register shift is when the gray β-strand is outside the para-β-X-β motif. Analysis of the residue offset for the inner and outer shifts for para-β-X-β motifs in naturally occurring protein structures revealed that the register shifts are mostly zero or positive; the origin of this preference is partly explained by energetic penalties of steric repulsion and buried polar atoms that emerge when unfavored register shifts occur. Source data

**Extended Data Fig. 3. Origin of the connection ending rule.**
Inner and outer register shift arrangements for para-para-β-X-β motifs (red and blue) violating the connection ending rule are shown on the left and right, respectively (only the second strands of the para-β-X-β motifs are shown). In the arrangements, the second strands are adjacent to each other and the connections are on the different β-sheet sides, which do not satisfy the register-shift rule (Extended Data Fig. 2) or the αβ-rule. In case of a register shift of non-zero [(i), (iii), (iv), and (vi)], the β-X-β motifs violate the register shift rule (Extended Data Fig. 2). In (i) and (iv), the red strand is shifted towards the negative orientation against the blue strand; in (iii) and (vi), the blue strand is shifted towards the negative orientation against the red strand. In case of a register shift of zero [(ii) and (v)], the αβ-rule is violated: the vector from the Cα to Cβ atoms of the first strand residue in either of the β-strands points towards the X region in β-X-β motifs.

**Extended Data Fig. 4. AAAB loop with the right twist angle for βα-units.**
a, The twist angle μ for βα-units of the anti-type (vector from the Cα to Cβ atoms of the last strand residue points away from the helix), defined as the dihedral angle between the plane defined by the β-strand vector and the CαCβ vector of the last strand residue, and the plane defined by the same CαCβ vector and the α-helix vector (the definitions of the β-strand and α-helix vectors have been described previously. b, Left: frequencies for ABEGO torsion patterns of loops in βα-units having a μ angle around 90° in naturally occurring protein structures. Right: distributions of the twist angle μ for each of the most frequently observed five loop types in the table on the left. The AAAB loop showing a clear peak at ~90° was used in the NF7 fold design. c, The backbone structure of the AAAB loop. Source data

**Extended Data Fig. 5. Newly introduced loop patterns for αβ-units.**
Frequencies of ABEGO torsion patterns for the loops in αβ-units in naturally occurring proteins are shown for the para- (left) and anti-types (right) (para-type: the vector from the Cα to Cβ atoms of the first strand residue points away from the helix; anti-type: the same vector points towards the helix). The GB, GBA, and BAAB loops have been used in previous *de novo* designed proteins^,. The BA and GABA loops for the para-type and the GBB loop for the anti-type were newly introduced in this study. Source data

**Extended Data Fig. 6. Two backbone blueprints used for the design of the target NF8.**
The torsion patterns immediately before the last strand are different.

**Extended Data Fig. 7. Structure search for naturally occurring proteins similar to the designs in terms of entire structures.**
For each designed structure, similar domain structures were searched against the ECOD domain dataset (99% sequence non-redundant set) using the two different TM-score -based structure alignment methods, TM-align and MICAN^, (sequential mode) (Different from TM-align, MICAN^, superimposes structures using secondary-structure-weighted TM-score). We collected all domains with a TM-score > 0.5 compared to each target structure and inspected them manually using the TOPS diagram. The domain with the largest TM-score for each target except for NF6-02 (there is no domain with a TM-score > 0.5) and the domain similar to each of the NF2 and NF7 designs, found by the manual inspection, were shown in each panel together with ECOD ID. No similar naturally occurring protein structures were found for the designs, except for the NF2 and NF4 designs.

**Extended Data Fig. 8. Comparison of core packing between design models and NMR structures.**
Hydrophobic residues in core, mainly for Leu, Ile, Phe, Tyr, and Trp, are shown in stick. For the residues with amino-acid type and residue number, detail descriptions in terms of HSQC spectra are provided in the Supplementary text.

**Extended Data Fig. 9. Smallest knotted protein designed, NF8.**
The stacked histogram represents the number of naturally occurring knot proteins in the PDB, depending on the chain length (original annotation data were obtained from the KnotProt database. Blue, red, and gray bars represent right-handed trefoil knot (R-Trefoil), left-handed trefoil knot (L-Trefoil), and other knot types (Other), respectively. The design NF8 with the R-Trefoil knot, indicated by an arrow, is characterized as the smallest knotted protein with 79 residues. Note that this is an exceptional case for R-Trefoil knot structures; the minimal size observed in nature is approximately 140 residues (the smallest L-Trefoil structure has 82 residues). Source data

See this image and copyright information in PMC

References

1. Orengo CA, Jones DT, Thornton JM. Protein superfamilles and domain superfolds. Nature. 1994;372:631–634. doi: 10.1038/372631a0. - DOI - PubMed
1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995;247:536–540. doi: 10.1016/S0022-2836(05)80134-2. - DOI - PubMed
1. Orengo CA, et al. CATH–a hierarchic classification of protein domain structures. Structure. 1997;5:1093–1109. doi: 10.1016/S0969-2126(97)00260-8. - DOI - PubMed
1. Zhang Y, Hubner IA, Arakaki AK, Shakhnovich E, Skolnick J. On the origin and highly likely completeness of single-domain protein structures. Proc. Natl Acad. Sci. USA. 2006;103:2605–2610. doi: 10.1073/pnas.0509379103. - DOI - PMC - PubMed
1. Taylor WR, Chelliah V, Hollup SM, MacDonald JT, Jonassen I. Probing the “dark matter” of protein fold space. Structure. 2009;17:1244–1252. doi: 10.1016/j.str.2009.07.012. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Exploration of novel αβ-protein folds through de novo design

Affiliations

Exploration of novel αβ-protein folds through de novo design

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources