. 2017:1529:265-277.

doi: 10.1007/978-1-4939-6637-0_13.

Parallel Computational Protein Design

Yichao Zhou¹, Bruce R Donald^{2

3}, Jianyang Zeng⁴

Affiliations

¹ Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, P. R. China.
² Department of Computer Science, Duke University, Durham, NC, USA.
³ Department of Biochemistry, Duke University Medical Center, Durham, NC, USA.
⁴ Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, P. R. China. zengjy321@tsinghua.edu.cn.

PMID: 27914056
PMCID: PMC5192564
DOI: 10.1007/978-1-4939-6637-0_13

Parallel Computational Protein Design

Yichao Zhou et al. Methods Mol Biol. 2017.

. 2017:1529:265-277.

doi: 10.1007/978-1-4939-6637-0_13.

Authors

Yichao Zhou¹, Bruce R Donald^{2

3}, Jianyang Zeng⁴

Affiliations

¹ Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, P. R. China.
² Department of Computer Science, Duke University, Durham, NC, USA.
³ Department of Biochemistry, Duke University Medical Center, Durham, NC, USA.
⁴ Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, P. R. China. zengjy321@tsinghua.edu.cn.

PMID: 27914056
PMCID: PMC5192564
DOI: 10.1007/978-1-4939-6637-0_13

Abstract

Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab (Gainza et al., Methods Enzymol 523:87, 2013) to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE (Gainza et al., PLoS Comput Biol 8:e1002335, 2012) and DEEPer (Hallen et al., Proteins 81:18-39, 2013) to also consider continuous backbone and side-chain flexibility.

Keywords: A*; CUDA; Dead-end elimination; GPGPU; Parallel computing; Protein design.

PubMed Disclaimer

Figures

**Figure 1**
An illustration of the traditional A* tree search algorithm. (a) An example of the A* search tree at a certain time. Nodes in dark shade represent the nodes that have already been expanded and visited. Nodes in light shade represent the frontier nodes which are stored in the priority queue and waiting to be expanded. The node labeled with a is the node with minimum heuristic function value in the priority queue and thus is currently being expended. (b) The workflow of the node expansion operation for current node a.

**Figure 2**
The illustration of the parallel node expansion operations in our parallel A* search algorithm. (a) An example of the A* search tree at a certain time. The meaning of Figure 2a is as same as that of Figure 1a except that now the algorithm is expanding nodes a, b, and c in parallel. The nodes labeled with a, b, and c are the nodes with the minimum heuristic function values in their respective priority queues. (b) The workflow on how multiple nodes are expanded simultaneously.

**Figure 3**
Semi-log plots about the ratios of speedups and memory overhead of our GPU-based A* search algorithm. The x axes represent the magnitude of the design problem, while the y axes of Figure 3a and 3b represent the ratios of speedups and memory overhead of our GPU-based algorithm comparing to the traditional sequential A* search algorithm, respectively. The circle and square marks represent the results of our GPU-based A* search algorithm with 768 and 4992 parallel priority queues, respectively.

See this image and copyright information in PMC

Cited by

Protein design accelerates the development and application of optogenetic tools.
Zhu L, Wang Y, Wu X, Wu G, Zhang G, Liu C, Zhang S. Zhu L, et al. Comput Struct Biotechnol J. 2025 Feb 21;27:717-732. doi: 10.1016/j.csbj.2025.02.014. eCollection 2025. Comput Struct Biotechnol J. 2025. PMID: 40092664 Free PMC article. Review.
The RESP AI model accelerates the identification of tight-binding antibodies.
Parkinson J, Hard R, Wang W. Parkinson J, et al. Nat Commun. 2023 Jan 28;14(1):454. doi: 10.1038/s41467-023-36028-8. Nat Commun. 2023. PMID: 36709319 Free PMC article.

References

1. Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen CY, Reza F, Anderson AC, Richardson DC, Richardson JS, et al. OSPREY: protein design with ensembles, flexibility, and provable algorithms. Methods in enzymology. 2013;523:87. - PMC - PubMed
1. Gainza P, Roberts KE, Donald BR. Protein design using continuous rotamers. PLoS computational biology. 2012;8(1):e1002335. - PMC - PubMed
1. Hallen MA, Keedy DA, Donald BR. Dead-end elimination with perturbations (DEEPer): A provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins: Structure, Function, and Bioinformatics. 2013;81(1):18–39. - PMC - PubMed
1. Roberts KE, Cushing PR, Boisguerin P, Madden DR, Donald BR. Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS computational biology. 2012;8(4):e1002477. - PMC - PubMed
1. Gorczynski MJ, Grembecka J, Zhou Y, Kong Y, Roudaia L, Douvas MG, Newman M, Bielnicka I, Baber G, Corpora T, et al. Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBF$\beta$ Chemistry & biology. 2007;14(10):1186–1197. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM078031/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Parallel Computational Protein Design

Affiliations

Parallel Computational Protein Design

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous