Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 22;17(11):e1009620.
doi: 10.1371/journal.pcbi.1009620. eCollection 2021 Nov.

De novo protein fold families expand the designable ligand binding site space

Affiliations

De novo protein fold families expand the designable ligand binding site space

Xingjie Pan et al. PLoS Comput Biol. .

Abstract

A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein "scaffold", which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. The ligand-binding site library.
A. Binding site examples. The Mg2+ ion is shown as a sphere; small molecules and protein residues are shown as sticks; carbon atoms are colored in green (small molecule) or grey (protein residues); oxygen atoms are colored in red; nitrogen atoms are colored in blue; polar interactions are shown as yellow dashed lines. B. Joint distribution of binding site sizes (numbers of binding site protein residues) and numbers of ligand heavy atoms. Binding site sizes are linearly correlated with the numbers of ligand heavy atoms. C, D. Amino acid (AA) frequencies (red, right y-axis) in ligand-binding sites and enrichment ratios (blue, left Y-axis) in ligand-binding sites compared to all residues in a protein. C. Distributions of all ligand binding sites. D. Distributions of single heavy atom ligand binding sites.
Fig 2
Fig 2. Matching ligand binding sites to scaffold libraries.
A. Schematic of the matching protocol. The ligand is represented as a yellow triangle. The ligand-binding site as a rigid body (green) is first matched to the scaffold (grey) by anchoring to a scaffold residue shown in the black circle. Then the binding site residues are aligned to the corresponding scaffold residues. Finally, the standard Rosetta matcher is applied to build the binding site side chains (magenta) onto the scaffold. B. The binding sites are matched to native and de novo designed scaffold families with Rossmann or NTF2 fold topologies. C. Examples of matches. The coloring scheme is the same as A.
Fig 3
Fig 3. Matchability of ligand binding sites depends on the binding site size.
Histograms of numbers of matches vs binding site sizes (number of protein residues in the binding site). Binding sites that cannot be matched to any scaffold are shown in blue. Binding sites that can be matched to at least one scaffold by the fast match method but cannot be matched by the standard Rosetta matcher are shown in orange. Binding sites that can be matched to at least one scaffold by the standard Rosetta matcher are in green. A-D. Results for 4 scaffold libraries; scaffold sets are indicated in each panel title.
Fig 4
Fig 4. Ligand binding sites are matched to all layers of scaffolds.
A. An example of scaffold residue layers assigned to a scaffold (PDB:3FH1) from the native NTF2 fold family by the Rosetta Layer residue selector. The surface, boundary and core layers are colored in purple, green and orange, respectively. B. Distributions of residue layers in different scaffold libraries. C. Distributions of residue layers of binding sites matched to different scaffold libraries. D. Distributions of binding site depth scores matched to different scaffold libraries.
Fig 5
Fig 5. Features affecting matching success rates of 3-residue ligand binding sites.
A. Venn diagrams of the number of Rosetta-matched 3-residue binding sites between pairs of scaffold sets. The number in the overlapping region is the observed number of binding sites that can be matched to both scaffold sets, with the expected number in parentheses. The number in the non-overlapping region within a circle denotes the binding sites that can only be matched to this scaffold set. The number outside the circles denotes the binding sites that cannot be matched to either of the two scaffold sets. B. The numbers of ligand heavy atoms are negatively correlated with the match success rates. C. The mean primary sequence distances between binding site residues are negatively correlated with match success rates.
Fig 6
Fig 6. Numbers of matches scale as power-law functions of numbers of scaffolds in fold families.
A-D. Log-log plots of the number of 3-residue matches vs the number of scaffolds. E-F. Log-log plots of the number of 3-residue binding sites that can only be matched to de novo scaffolds of specific topologies vs the number of scaffolds. The black lines represent linear fits.

References

    1. Feldmeier K, Hocker B. Computational protein design of ligand binding and catalysis. Curr Opin Chem Biol. 2013;17(6):929–33. doi: 10.1016/j.cbpa.2013.10.002 - DOI - PubMed
    1. Feng J, Jester BW, Tinberg CE, Mandell DJ, Antunes MS, Chari R, et al.. A general strategy to construct small molecule biosensors in eukaryotes. eLife. 2015;4. doi: 10.7554/eLife.10606 - DOI - PMC - PubMed
    1. Glasgow AA, Huang YM, Mandell DJ, Thompson M, Ritterson R, Loshbaugh AL, et al.. Computational design of a modular protein sense-response system. Science (New York, NY. 2019;366(6468):1024–8. doi: 10.1126/science.aax8780 - DOI - PMC - PubMed
    1. Yang W, Lai L. Computational design of ligand-binding proteins. Current opinion in structural biology. 2017;45:67–73. doi: 10.1016/j.sbi.2016.11.021 - DOI - PubMed
    1. Zanghellini A, Jiang L, Wollacott AM, Cheng G, Meiler J, Althoff EA, et al.. New algorithms and an in silico benchmark for computational enzyme design. Protein Sci. 2006;15(12):2785–94. doi: 10.1110/ps.062353106 - DOI - PMC - PubMed

Publication types