Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Sep 21:2023.09.20.558720.
doi: 10.1101/2023.09.20.558720.

Computational design of sequence-specific DNA-binding proteins

Affiliations

Computational design of sequence-specific DNA-binding proteins

Cameron J Glasscock et al. bioRxiv. .

Update in

  • Computational design of sequence-specific DNA-binding proteins.
    Glasscock CJ, Pecoraro RJ, McHugh R, Doyle LA, Chen W, Boivin O, Lonnquist B, Na E, Politanska Y, Haddox HK, Cox D, Norn C, Coventry B, Goreshnik I, Vafeados D, Lee GR, Gordân R, Stoddard BL, DiMaio F, Baker D. Glasscock CJ, et al. Nat Struct Mol Biol. 2025 Nov;32(11):2252-2261. doi: 10.1038/s41594-025-01669-4. Epub 2025 Sep 12. Nat Struct Mol Biol. 2025. PMID: 40940539 Free PMC article.

Abstract

Sequence-specific DNA-binding proteins (DBPs) play critical roles in biology and biotechnology, and there has been considerable interest in the engineering of DBPs with new or altered specificities for genome editing and other applications. While there has been some success in reprogramming naturally occurring DBPs using selection methods, the computational design of new DBPs that recognize arbitrary target sites remains an outstanding challenge. We describe a computational method for the design of small DBPs that recognize specific target sequences through interactions with bases in the major groove, and employ this method in conjunction with experimental screening to generate binders for 5 distinct DNA targets. These binders exhibit specificity closely matching the computational models for the target DNA sequences at as many as 6 base positions and affinities as low as 30-100 nM. The crystal structure of a designed DBP-target site complex is in close agreement with the design model, highlighting the accuracy of the design method. The designed DBPs function in both Escherichia coli and mammalian cells to repress and activate transcription of neighboring genes. Our method is a substantial step towards a general route to small and hence readily deliverable sequence-specific DBPs for gene regulation and editing.

PubMed Disclaimer

Conflict of interest statement

Competing interests. C.G., R.P., R.M., C.N., F.D., and D.B. are co-inventors on a provisional patent application that incorporates discoveries described in this manuscript.

Figures

Fig. 1
Fig. 1. Overview of the DNA binder design pipeline.
(A) Design principles for design of sequence-specific DBPs. (B) HTH backbone scaffold library generated from metagenomic sequences. (C) DNA target, starting with either a specific nucleotide sequence modeled as B-DNA or a DNA crystal structure. (D) Generation of RIF (gray) to form base-specific hydrogen bonds and hydrophobic packing interactions. Example rotamers (pink) are generated for nucleotide bases (orange; clockwise from upper left: adenine, thymine, guanine, cytosine). (E) Docking of scaffolds onto the RIF to identify seed interactions and placements with base-specific contacts, followed by sequence optimization of the DNA-scaffold interactions using Rosetta or LigandMPNN-based sequence design and Rosetta modeling. (F) Recognition helices making multiple favorable interactions with the target are extracted from first round designs, and grafted onto the scaffold library, followed by further rounds of interface sequence design and filtering for favorable interactions. (G) Inpainting of the protein loops (red) results in new connecting loops (teal) between the helical portions of the design, followed by further rounds of interface sequence design and filtering.
Fig. 2.
Fig. 2.. Designed DBPs bind with high affinity and specificity to their intended target sites.
(A to E) Characterization of DBPs 6, 35, 48, 56, and 62, respectively. Left, Computational design models of characterized designs at the DNA-binding interface. DNA bases and protein residues involved in hydrogen bonding interactions are shown in orange and pink, respectively. Hydrogen bonds are highlighted with dashed yellow lines. Middle, Relative binding activity (PE/FITC normalized to the no-competitor condition) from flow cytometry analysis in yeast display competition assays with all possible DNA base mutations at each position of the competitor oligo. Blue indicates competitor mutations where competition was stronger than with the wild-type competitor, while red indicates competitor mutations where competition was weaker. Arrows indicate base positions contacted with hydrogen bonds or hydrophobic contacts to base atoms in the design model. Additional characterized designs are shown in fig. S6. Right, Binding of purified miniprotein designs to the DNA target with BLI. Each line represents biotinylated dsDNA target dilutions by ⅓. The highest DNA target concentration is indicated in each plot. Additional characterized designs are shown in fig. S8. (F) DBPs 6 and 48 (colored) differ in both structure and docking mode to native co-complex structures with matching DNA binding sites (gray). (G) DBP35 has similar structure and dock to the closest match in the PDB, but binds a distinct DNA target site, while DBPs 56 and 62 have structures similar to the closest matches but different docks and DNA target sites. DBP48 was analyzed with sequence C due to its improved binding signal and nearly identical modeled binding sites (fig. S5); all other designs were analyzed with their designed target sequence.
Fig. 3.
Fig. 3.. Structural validation of DNA binder designs.
(A) Co-crystal structure of DBP48 (colored) and the design model (gray) are in close agreement. (B) Zoom-in showing the close agreement of critical interface residues R38 and S39 between the crystal structure and design model. (C) Close-up of water-mediated hydrogen bonds formed by S42 and D43. (D) (Left) Designed DNA binding proteins colored by positional Shannon entropy from site saturation mutagenesis, with blue indicating positions of low entropy (conserved) and red those of high entropy (not conserved). (Right) Zoomed-in views of central regions of the design interfaces. (E) Heat maps representing SC-50 values for single mutations in the design model core (left) and the designed interface (right). Substitutions that are heavily depleted are shown in blue, and beneficial mutations are shown in red. Full SSM maps over all positions and close-up views of DBP1 are provided in fig. S12-13.
Fig. 4.
Fig. 4.. Designed DBPs are highly specific.
(A) Histograms of E-score values for DBP6 (top) and DBP48 (bottom) from universal protein binding microarray experiments (uPBM) showing high specificity to the designed target sequence. The E-score distribution of all 7-mers are shown in gray while the distribution of 7-mers containing the designed binding site motif is shown in orange and the distribution for a mutated binding site motif is shown in blue. (B) All-by-all orthogonality matrix for 5 designed DNA binders screened by yeast display, normalized by row, at a DNA concentration of 1 µM (with avidity). Full orthogonality matrix with all tested DNA targets shown in fig. S15I.
Fig. 5.
Fig. 5.. Designed DBPs function in living cells to direct transcriptional repression and activation.
(A) Illustration of the RFdiffusion method for building out DBP domains into homo- or heterodimer arrangements, along with repressor designs selected for all-by-all repression assays. DBP48_A1/DBP48_A1 and DBP57_A1/DBP57_A1 are homodimer constructs transitioned into the TetR backbone, the remainder are de novo homo- or heterodimer constructs. (B) Transcriptional repression in Escherichia coli. Functional IPTG-inducible repressor block transcription of YFP from a synthetic promoter containing the designed DBP binding sites (red text) around the −10 and −35 elements (blue text). Arrows indicate directionality of the binding site. (C) All-by-all orthogonality matrix showing fold repression of YFP Fluorescence from flow cytometry analysis of cells containing the successful NOT gate circuits. Blue outlines indicate on-target repressor-promoter pairs. (D) Transcriptional activation in HEK293T cells measured by ENGRAM. synTFs were created by fusing the GCN4 dimerization domain and the VP64 activation domain to the C-termini of the DBPs. The synTF-specific cis-regulatory elements (CRE) were created by evenly distributing palindromic binding motifs on a 130 bp transcriptionally inactive DNA sequence where each CRE drives a uniquely barcoded pegRNA for recording into DNA TAPE. (E) Fold activation of synTFs measured as normalized barcode abundance.

References

    1. Luscombe N. M., Austin S. E., Berman H. M., Thornton J. M., Genome Biol., in press, doi: 10.1186/gb-2000-1-1-reviews001. - DOI - PMC - PubMed
    1. Villegas Kcam M. C., Tsong A. J., Chappell J., Rational engineering of a modular bacterial CRISPR–Cas activation platform with expanded target range. Nucleic Acids Res. 49, 4793–4802 (2021). - PMC - PubMed
    1. Wilken M. S., Ciarlo C., Pearl J., Schanzer E., Liao H., Biber B. V., Queitsch K., Bloom J., Federation A., Acosta R., Vong S., Otterman E., Dunn D., Wang H., Zrazhevskiy P., Nandakumar V., Bates D., Sandstrom R., Urnov F. D., Funnell A., Green S., Stamatoyannopoulos J. A., Quantitative dialing of gene expression via precision targeting of KRAB repressor. bioRxiv (2020), doi: 10.1101/2020.02.19.956730. - DOI
    1. Slattery M., Zhou T., Yang L., Dantas Machado A. C., Gordân R., Rohs R., Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014). - PMC - PubMed
    1. Cao L., Coventry B., Goreshnik I., Huang B., Sheffler W., Park J. S., Jude K. M., Marković I., Kadam R. U., Verschueren K. H. G., Verstraete K., Walsh S. T. R., Bennett N., Phal A., Yang A., Kozodoy L., DeWitt M., Picton L., Miller L., Strauch E.-M., DeBouver N. D., Pires A., Bera A. K., Halabiya S., Hammerson B., Yang W., Bernard S., Stewart L., Wilson I. A., Ruohola-Baker H., Schlessinger J., Lee S., Savvides S. N., Garcia K. C., Baker D., Design of protein-binding proteins from the target structure alone. Nature. 605, 551–560 (2022). - PMC - PubMed

Publication types