Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jul 4:2025.06.30.662289.
doi: 10.1101/2025.06.30.662289.

Systematic DNA nicking reveals the structural logic of protein recognition

Affiliations

Systematic DNA nicking reveals the structural logic of protein recognition

Yumi Minyi Yao et al. bioRxiv. .

Abstract

Transcription factors (TFs) bind to specific genomic sites to regulate gene expression1,2. These interactions almost universally require DNA deformation and the accumulation of local mechanical strain within the double helix. As a result, TF-DNA recognition is determined not only by the linear base sequence but also by the spatial alignment of bases and phosphates, as well as their ability to adopt and retain structural deformations3. However, the sequence-centric focus of existing studies makes it challenging to directly probe DNA structural determinants and to decouple their impact from alterations in base sequences, limiting our ability to unravel the key factors influencing binding beyond the sequence identity and leaving significant gaps in our understanding of the principles governing TF-DNA recognition. Here, we introduce a high-throughput strategy to perturb TF binding sites without altering their base sequence, enabling systematic investigation of the structural features of DNA that govern TF binding. Our method, PIC-NIC, introduces single-strand breaks (SSBs) at every position within the binding site, selectively disrupting backbone continuity while preserving nucleotide identity, with the resulting effects on TF binding measured quantitatively. Applied to 15 human TFs spanning eight structural classes, and supported by seven high-resolution TF-DNA crystal structures and molecular dynamics simulations, PIC-NIC uncovers discrete backbone positions serving as structural anchor points where nicks can abolish binding, rewire sequence preferences, or even enhance affinity. By decoupling structural and chemical contributions, we demonstrate that DNA mechanics-encoded in backbone geometry and continuity-can independently shape binding specificity beyond the linear code of base identity. These findings shift the paradigm of TF-DNA recognition, establishing the backbone not as a passive scaffold, but as a functional determinant capable of directing regulatory mechanisms through its physical architecture.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

Fig 1.
Fig 1.. PIC-NIC enables high-throughput dissection of structural contributions to TF–DNA recognition.
(a) Visualization of key molecular features underlying TF–DNA recognition, using representative complex structures (PDB IDs: 5KKQ and 1QNE). (b) Conceptual framework underlying PIC-NIC. Traditional base mutagenesis (left) alters nucleotide identity to probe base-specific contributions. PIC-NIC systematically extends this framework to additional molecular features: phosphate removal (middle), which eliminates backbone phosphates while preserving base identity; and site-specific nicks (right), which preserve both base and phosphate composition while selectively disrupting backbone continuity and geometry. (c) Overview of the PIC-NIC platform. Custom-designed nicked DNA libraries are immobilized on microarrays, incubated with target TFs and fluorescently labeled antibodies, and quantitatively scanned to assess binding at each nicked position. (d) Schematic depiction of a representative PIC-NIC binding profile (top), illustrating how relative binding levels (y-axis) are measured across all nicked positions (x-axis) compared to the intact control for a given TF. Aggregate analysis across all 15 TFs (bottom; see Methods) classifies nicked positions as retaining strong binding (white), minor disruption (grey), major disruption (gold), or abolishment of binding (brown), for nicks with 5′ phosphate retained (top bar) or removed (bottom bar). (e) Disruption at non-contacted positions suggests a role for DNA deformations. As exemplified by CTCF, a nick at a non-contact position (orange) markedly reduces binding, while base mutations at the same position have only minor effects. Structural analysis (PDB ID: 5KKQ) shows that this region exhibits elevated deviations from ideal B-form DNA, with the y-axis representing standard deviations from canonical structural parameters.
Fig 2.
Fig 2.. Disruption of phosphate contacts reveals anchoring roles in ETS1–DNA recognition.
(a) ETS1 forms base-specific and phosphate-specific interactions. In the ETS1–DNA crystal structure (left, PDB ID: 1K79), residues contacting DNA bases are highlighted in cyan, and those contacting backbone phosphates in green. Insets show close-up views of representative phosphate (top) and base (bottom) contacts. Schematic contact maps (right), using consistent color coding, highlight key protein–DNA contacts on both strands, with annotated positions spanning the binding site sequence. (b) PIC-NIC phosphate removal profiles (left) show the log difference in binding signal relative to the intact site (y-axis) across positions in the top and bottom strands of the ETS1 binding site (x-axis). The profile reveals that removing phosphates at contact positions markedly reduces ETS1 binding (dashed green circles). Each box in the plots represents 10 replicate measurements. At position 2 (right panels), PIC-NIC and BLI measurements comparing nicks with and without phosphate to the intact sequence reveal that restoring the phosphate does not rescue binding, yielding affinities comparable to a non-binding control. (c) Representative snapshots of MD simulations (Methods) reveal reduced complex stability and greater structural disruption at the protein–DNA interface when a nick lacking the 5′ phosphate is introduced at position 2, as evidenced by elevated phosphate backbone RMSD values (5′ GCGGAAA 3′) compared to both the control nicked sequence and the wild-type duplex. Box plots display the average RMSD (compared to the initial frame) over the 1000 ns of simulation for five replicates per DNA/protein complex construct. (d) Mutation of a phosphate-contacting residue alters the ETS1 binding profile. PWM logos of wild-type ETS1 (top left) and the K379A mutant (bottom left), which disrupts the phosphate contact at position 7, reveal altered sequence preferences. Red arrows highlight positions with the most pronounced changes in base selectivity, as reflected by shifts in information content (y-axis). Structural insets (right) show the wild-type interaction between K379 and the phosphate at position 7, which is lost in the alanine mutant, eliminating the original hydrogen bond.
Fig 3.
Fig 3.. DNA nicks at Hoogsteen-prone positions reshape TBP sequence specificity.
(a) The crystal structure of TBP bound to a bent TATA-box DNA (PDB ID:1QNE) shows key intercalation positions 2 and 8 (orange) by phenylalanine residues. (b) Difference in ln binding signal for TBP to two similar TATA-box variants: TATAAAAG (blue) and TATAAATG (gold). In the intact duplex (top), TBP favors the AG variant, whilst upon nicking at position 8 on the bottom strand and removing the 5′ phosphate (bottom), preference switches to the TG variant. Structural overlays of the respective high-resolution crystal structures show high structural similarity in the case of intact DNA (r.m.s.d. = 0.190 Å). In the presence of the nick, increased bending in the nicked TG complex is observed (gold, r.m.s.d. = 0.465 Å; Extended Data Fig.7). (c) Close-up base pair geometry at position 8. In the intact TATAAATG complex (top), the terminal G–C of the binding site forms a Hoogsteen pair with an 8.4 Å C1′–C1′ distance. In the nicked complex (bottom), the same base pair adopts canonical Watson–Crick geometry with a widened 10.0 Å distance.
Fig 4.
Fig 4.. DNA nicks enhance SOX2 binding.
(a) The crystal structure of SOX2 bound to DNA (PDB ID: 6HT5) shows that position 3 (orange) is highly kinked. (b) Top panel: PIC-NIC profile across the SOX2 motif reveals that a nick at position 3 significantly enhances the binding of SOX2. Dashed line represents the binding signal level of SOX2 to intact DNA. Bottom panels: SOX2 structure (PDB ID: 6HT5) reveals an extremly wide minor groove exceeding 13 Å and a significantly decreased twisting angle at this position. (c) Bio-Layer Interferometry (BLI) confirms enhanced binding upon nicking at position 3 (green), driven by faster association (kon).
Fig 5.
Fig 5.. The effects of DNA nicks are context dependent for EGR1.
(a) EGR1/DNA backbone phosphate contact map for both strands along the binding motif. Orange represents positions where nicks have the most detrimental effect on binding and green represents positions where nicks are tolerated. (b) PIC-NIC binding profiles for EGR1 upon introducing site-specific nicks at positions 3 & 6 on the left strand, and 7 on both strands. Loss of the phosphate at positions 3 and 6 strongly reduces EGR1 binding, while both left and right strand nicks at position 7 have little effect, indicating position-specific tolerance to backbone disruption. (c) Structural zooms of two zinc coordination sites, highlighting the dual roles of H125 and H153 in DNA phosphate interaction and zinc finger coordination, rendering these positions highly sensitive to nicking. (d) High-resolution crystal structures of EGR1 bound to nicked DNA at position 7 on both strands (blue: left strand nick; pink: right strand nick), showing preservation of overall protein–DNA conformation despite the missing phosphate at the break site. (e) Superposition of high-resolution crystal structures of EGR1 bound to intact DNA (gray) versus DNA nicked at phosphate position 7 on both strands (blue: left strand nick; pink: right strand nick). Minimal global structural deviation is observed (r.m.s.d. < 0.3 Å, overlay of all atoms), indicating high structural similarity between intact and nicked complexes.

References

    1. Garvie C. W. & Wolberger C. Recognition of specific DNA sequences. Mol Cell 8, 937–946 (2001). 10.1016/s1097-2765(01)00392-6 - DOI - PubMed
    1. Kim S. & Shendure J. Mechanisms of Interplay between Transcription Factors and the 3D Genome. Mol Cell 76, 306–319 (2019). 10.1016/j.molcel.2019.08.010 - DOI - PubMed
    1. Sielemann J., Wulf D., Schmidt R. & Brautigam A. Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana. Nat Commun 12, 6549 (2021). 10.1038/s41467-021-26819-2 - DOI - PMC - PubMed
    1. Basu A. et al. Deciphering the mechanical code of the genome and epigenome. Nat Struct Mol Biol 29, 1178–1187 (2022). 10.1038/s41594-022-00877-6 - DOI - PMC - PubMed
    1. Rohs R. et al. Origins of specificity in protein-DNA recognition. Annu Rev Biochem 79, 233–269 (2010). 10.1146/annurev-biochem-060408-091030 - DOI - PMC - PubMed

Publication types