Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017:1529:265-277.
doi: 10.1007/978-1-4939-6637-0_13.

Parallel Computational Protein Design

Affiliations

Parallel Computational Protein Design

Yichao Zhou et al. Methods Mol Biol. 2017.

Abstract

Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab (Gainza et al., Methods Enzymol 523:87, 2013) to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE (Gainza et al., PLoS Comput Biol 8:e1002335, 2012) and DEEPer (Hallen et al., Proteins 81:18-39, 2013) to also consider continuous backbone and side-chain flexibility.

Keywords: A*; CUDA; Dead-end elimination; GPGPU; Parallel computing; Protein design.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An illustration of the traditional A* tree search algorithm. (a) An example of the A* search tree at a certain time. Nodes in dark shade represent the nodes that have already been expanded and visited. Nodes in light shade represent the frontier nodes which are stored in the priority queue and waiting to be expanded. The node labeled with a is the node with minimum heuristic function value in the priority queue and thus is currently being expended. (b) The workflow of the node expansion operation for current node a.
Figure 1
Figure 1
An illustration of the traditional A* tree search algorithm. (a) An example of the A* search tree at a certain time. Nodes in dark shade represent the nodes that have already been expanded and visited. Nodes in light shade represent the frontier nodes which are stored in the priority queue and waiting to be expanded. The node labeled with a is the node with minimum heuristic function value in the priority queue and thus is currently being expended. (b) The workflow of the node expansion operation for current node a.
Figure 2
Figure 2
The illustration of the parallel node expansion operations in our parallel A* search algorithm. (a) An example of the A* search tree at a certain time. The meaning of Figure 2a is as same as that of Figure 1a except that now the algorithm is expanding nodes a, b, and c in parallel. The nodes labeled with a, b, and c are the nodes with the minimum heuristic function values in their respective priority queues. (b) The workflow on how multiple nodes are expanded simultaneously.
Figure 2
Figure 2
The illustration of the parallel node expansion operations in our parallel A* search algorithm. (a) An example of the A* search tree at a certain time. The meaning of Figure 2a is as same as that of Figure 1a except that now the algorithm is expanding nodes a, b, and c in parallel. The nodes labeled with a, b, and c are the nodes with the minimum heuristic function values in their respective priority queues. (b) The workflow on how multiple nodes are expanded simultaneously.
Figure 3
Figure 3
Semi-log plots about the ratios of speedups and memory overhead of our GPU-based A* search algorithm. The x axes represent the magnitude of the design problem, while the y axes of Figure 3a and 3b represent the ratios of speedups and memory overhead of our GPU-based algorithm comparing to the traditional sequential A* search algorithm, respectively. The circle and square marks represent the results of our GPU-based A* search algorithm with 768 and 4992 parallel priority queues, respectively.
Figure 3
Figure 3
Semi-log plots about the ratios of speedups and memory overhead of our GPU-based A* search algorithm. The x axes represent the magnitude of the design problem, while the y axes of Figure 3a and 3b represent the ratios of speedups and memory overhead of our GPU-based algorithm comparing to the traditional sequential A* search algorithm, respectively. The circle and square marks represent the results of our GPU-based A* search algorithm with 768 and 4992 parallel priority queues, respectively.

Similar articles

Cited by

References

    1. Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen CY, Reza F, Anderson AC, Richardson DC, Richardson JS, et al. OSPREY: protein design with ensembles, flexibility, and provable algorithms. Methods in enzymology. 2013;523:87. - PMC - PubMed
    1. Gainza P, Roberts KE, Donald BR. Protein design using continuous rotamers. PLoS computational biology. 2012;8(1):e1002335. - PMC - PubMed
    1. Hallen MA, Keedy DA, Donald BR. Dead-end elimination with perturbations (DEEPer): A provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins: Structure, Function, and Bioinformatics. 2013;81(1):18–39. - PMC - PubMed
    1. Roberts KE, Cushing PR, Boisguerin P, Madden DR, Donald BR. Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS computational biology. 2012;8(4):e1002477. - PMC - PubMed
    1. Gorczynski MJ, Grembecka J, Zhou Y, Kong Y, Roudaia L, Douvas MG, Newman M, Bielnicka I, Baber G, Corpora T, et al. Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBF$\beta$ Chemistry & biology. 2007;14(10):1186–1197. - PubMed

Publication types

LinkOut - more resources