Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun;23(6):413-24.
doi: 10.1089/cmb.2015.0194. Epub 2016 Jan 8.

BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design

Affiliations

BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design

Jonathan D Jou et al. J Comput Biol. 2016 Jun.

Abstract

Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.

Keywords: OSPREY; branch-decomposition; dynamic programming; ensemble-based algorithms; protein design; provable algorithms; sparse residue interaction graphs.

PubMed Disclaimer

Figures

<b>FIG. 1.</b>
FIG. 1.
(A) A sample protein design problem represented as a residue interaction graph (B), with residues as vertices and pairwise interactions as edges. (C) The sparse residue interaction graph generated by deleting (a, d), (a, e), (b, d), and (b, e), shown as red crosses in (A) and (B).
<b>FIG. 2.</b>
FIG. 2.
An example branch-decomposition. (A) The edges of the sparse graph correspond to a node in the branch-decomposition tree. Along the highlighted edge of the branch-decomposition, the mutable residues are separated into three sets: the L-set, which exists only in leaves to the left of the edge; the R-set, which exists only in leaves to the right of that edge; and the M-set, which can be found on both sides of the edge. (B) This tree is arbitrarily rooted for use by BWM*.
<b>FIG. 3.</b>
FIG. 3.
Total Effective Search Space predicts BWM* performance. x-axis shows TESS for each design problem using their respective cutoff and energy window. y-axis shows unpruned conformations for each design problem after DEE pruning with their respective energy window. (A) Dark green points: full ensemble with energy cutoff 0.1 kcal/mol, energy window 1 kcal/mol. Light green points: full ensemble with energy cutoff 0.2 kcal/mol, energy window 0.5 kcal/mol. The yellow triangles, orange diamonds, and red dot have x values of TESS using unpruned conformations with energy cutoff 0.1 kcal/mol, energy window 1 kcal/mol for cases where larger energy cutoffs or smaller energy windows were necessary to finish preprocessing. (B) Blue triangles: x values are TESS for calculating only the sparse ensemble (energy cutoff 0.3 kcal/mol or higher, energy window 0.5 kcal/mol). (C) Blue diamonds: x values are TESS for calculating only the sparse GMEC (energy cutoff between 0.1 and 0.5 kcal/mol, energy window 0 kcal/mol). Red cross: x value is TESS for computing the sparse GMEC of 2qcp (see text).
<b>FIG. 4.</b>
FIG. 4.
Both actual and worst-case bounds for BWM* are better than A*. (A) Comparison of the number of mutable residues considered. The maximum observed subproblem size is |Mλ| and worst-case problem size is formula image, which bounds |Mλ| from above. (B) Times to enumerate full ensemble for all design problems with energy window of 1 kcal/mol and energy cutoff of 0.1 kcal/mol. (C) Actual search space size vs. worst-case bounds. x-Axis is the 36 design problems in (A). Problems are ranked by A* worst-case problem size (qn); q and n measure the size of the design problem and are shown below the line; n increases from 4 to 16, shown in orange, and q is between 3 and 41, shown in red. Solid green line: Actual number (formula image) of unpruned conformations after DEE, where qr is the number of unpruned rotamers for mutable residue r and formula image. Solid blue: TESS. Dotted blue: BWM worst-case problem size bounds q|Mλ|. Dotted green: A* worst-case problem size bounds qn.
<b>FIG. 5.</b>
FIG. 5.
The sparse ensemble closely approximates the full ensemble. Figures shown for YciI protein from H. influenzae (PDB id: 1mwq) with energy cutoff of 0.1 kcal/mol and energy window of 1 kcal/mol. For each sequence, the x-axis plots the full energy of each conformation of the ensemble, and the y-axis plots the percent contribution of each conformation to the partition function Z (called q in Georgiev et al., 2008b). Conformations in both the full and sparse ensembles are colored green. Conformations in the full ensemble that are not in the sparse ensemble are in red. Conformations in the sparse ensemble that are not in the full ensemble are in blue. GMEC is circled in green.
<b>FIG. 6.</b>
FIG. 6.
BWM* enumeration is orders of magnitude faster than A*. Figures shown for aortic preferentially expressed protein-1 (PDB id:1u2h) with energy cutoff 0.1 kcal/mol, and third KH domain of heterogeneous nuclear ribonucleoprotein K (PDB id: 1zzk) with distance cutoff of 7 Å. Energy window was 1 kcal/mol for both. Blue: BWM* time. Green: A* time. (A) and (C) show cumulative time against total conformations enumerated. (B) and (D) show enumeration time per conformation.

Similar articles

Cited by

References

    1. Chen C.-Y., Georgiev I., Anderson A.C., and Donald B.R. 2009. Computational structure-based redesign of enzyme activity. Proc. Natl. Acad. Sci. U.S.A. 106, 3764–3769 - PMC - PubMed
    1. Dechter R., and Mateescu R. 2007. AND/OR search spaces for graphical models. Artif. Intell. 171, 73–106
    1. Desjarlais J.R., and Handel T.M. 1995. De novo design of the hydrophobic cores of proteins. Prot. Sci. 4, 2006–2018 - PMC - PubMed
    1. Desmet J., Spriet J., and Lasters I. 2002. Fast and accurate side-chain topology and energy refinement (FASTER) as a new method for protein structure optimization. Proteins 48, 31–43 - PubMed
    1. Donald B.R. 2011. Algorithms in Structural Molecular Biology. MIT Press, New York

Publication types

LinkOut - more resources