. 2025 Mar 20;21(3):e1012818.

doi: 10.1371/journal.pcbi.1012818. eCollection 2025.

Gauge fixing for sequence-function relationships

Anna Posfai¹, Juannan Zhou^{1

2}, David M McCandlish¹, Justin B Kinney¹

Affiliations

¹ Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America.
² Department of Biology, University of Florida, Gainesville, Florida, United States of America.

PMID: 40111986
PMCID: PMC11957564
DOI: 10.1371/journal.pcbi.1012818

Gauge fixing for sequence-function relationships

Anna Posfai et al. PLoS Comput Biol. 2025.

. 2025 Mar 20;21(3):e1012818.

doi: 10.1371/journal.pcbi.1012818. eCollection 2025.

Authors

Anna Posfai¹, Juannan Zhou^{1

2}, David M McCandlish¹, Justin B Kinney¹

Affiliations

¹ Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America.
² Department of Biology, University of Florida, Gainesville, Florida, United States of America.

PMID: 40111986
PMCID: PMC11957564
DOI: 10.1371/journal.pcbi.1012818

Abstract

Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called "gauge freedoms" in physics) by imposing additional constraints (a process called "fixing the gauge"). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.

Copyright: © 2025 Posfai et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Choice of gauge impacts model parameters.**
(A–C) Parameters, expressed in three different gauges, for an additive model describing the (negative) binding energy of the *E. coli* transcription factor CRP to DNA. Model parameters are from [37]. In each panel, additive parameters $θ_{l}^{c}$ are shown using both (top) a heat map and (bottom) a sequence logo [39]. The value of the constant parameter $θ_{0}$ is also shown. (A) The zero-sum gauge, in which the additive parameters at each position sum to zero. (B) The wild-type gauge, in which the additive parameters at each position quantify activity differences with respect to a wild-type sequence, $s^{wt}$ . The wild-type sequence used here (indicated by dots on the heat map) is the CRP binding site present at the *E. coli* lac promoter. (C) The maximum gauge, in which the additive parameters at each position quantify differences with respect to the optimal character at that position. Note that, while the value of each additive parameter $θ_{l}^{c}$ varies between panels A-C, differences of the form $θ_{l}^{c} - θ_{l}^{c^{'}}$ are preserved.

**Fig 2. Geometry of gauge spaces for additive one-hot models.**
(A–C) Geometric representation of the gauge space Θ to which the additive parameters at each position l are restricted in the corresponding panel of Fig 1. Each of the four sequence features ( $θ_{l}^{A}$ , $θ_{l}^{C}$ , $θ_{l}^{G}$ , and $θ_{l}^{T}$ ) corresponds to a different axis. Note that the two axes for $θ_{l}^{G}$ and $θ_{l}^{T}$ are shown as one axis to enable 3D visualization. Black and gray arrows respectively denote unit vectors pointing in the positive and negative directions along each axis. G indicates the space of gauge transformations.

**Fig 3. Binary landscape expressed in various parametric family gauges.**
(A) Simulated random activity landscape for binary sequences of length L = 3. (B) Parameters of the all-order interaction model for the binary landscape as functions of η = λ/(1 + λ). Values of η corresponding to different named gauges are indicated. Note: because the uniform distribution is assumed in all these gauges, the hierarchical gauge is also the zero-sum gauge.

**Fig 4. Landscape exploration using hierarchical gauges.**
(A) NMR structure of GB1, with residues V39, D40, G41, and V54 shown (PDB: 3GB1, from [66]). (B) Distribution of log₂ enrichment relative to wild-type measured by [60] for nearly all 160,000 GB1 variants having mutations at positions 39, 40, 41, and 54. (C) Pairwise interaction model parameters inferred from the data of [60], expressed in the uniform hierarchical gauge (i.e., the zero-sum gauge). Boxes indicate parameters contributing to the wild-type sequence, VDGV. (D) Performance of pairwise-interaction model. Axes reflect log₂ enrichment values relative to wild-type. Each dot represents a randomly chosen variant GB1 protein assayed by [60]. For clarity, only 5,000 of the ∼160,000 assayed GB1 variants are shown. (E) Probability logos [39] for uniform, region 1, region 2, and region 3 sequence distributions. Distributions of pairwise interaction model predictions for each region are also shown. (F) Model parameters expressed in the region 1, region 2, and region 3 hierarchical gauges. Dots and tick marks indicate region-specific constraints. Probability densities (panels B and D) were estimated using DEFT [45]. Pairwise interaction model parameters were inferred by least-squares regression using MAVE-NN [39]. Regions 1, 2, and 3 were defined based on [64]. NMR: nuclear magnetic resonance. GB1: domain B1 of protein G.

**Fig 5. Model coarse-graining using hierarchical gauges.**
Shown are data for 500 random 4 aa sequences generated using each of the four distributions listed in Fig 4E (i.e., uniform, region 1, region 2, and region 3). Vertical axes show log₂ enrichment (relative to wild-type) as predicted by additive models of GB1 derived by model truncation using region-specific zero-sum gauges (from Fig 4C and 4F). Horizontal axes show predictions of the full pairwise-interaction model. Diagonals indicate equality. GB1: domain B1 of protein G.

See this image and copyright information in PMC

Update of

Gauge fixing for sequence-function relationships.
Posfai A, Zhou J, McCandlish DM, Kinney JB. Posfai A, et al. bioRxiv [Preprint]. 2024 Jun 24:2024.05.12.593772. doi: 10.1101/2024.05.12.593772. bioRxiv. 2024. Update in: PLoS Comput Biol. 2025 Mar 20;21(3):e1012818. doi: 10.1371/journal.pcbi.1012818. PMID: 38798671 Free PMC article. Updated. Preprint.

References

1. Kinney JB, McCandlish DM. Massively parallel assays and quantitative sequence-function relationships. Annu Rev Genomics Hum Genet. 2019;20:99–127. doi: 10.1146/annurev-genom-083118-014845 - DOI - PubMed
1. Weinberger ED. Fourier and Taylor series on fitness landscapes. Biol Cybern. 1991;65:321–30.
1. Stadler PF. Landscapes and their correlation functions. J Math Chem. 1996;20:1–45.
1. Weinreich DM, Lan Y, Wylie CS, Heckendorn RB. Should evolutionary geneticists worry about higher-order epistasis?. Curr Opin Genet Dev. 2013;23(6):700–7. doi: 10.1016/j.gde.2013.10.007 - DOI - PMC - PubMed
1. Poelwijk FJ, Krishna V, Ranganathan R. The context-dependence of mutations: a linkage of formalisms. PLoS Comput Biol 2016;12(6):e1004771. doi: 10.1371/journal.pcbi.1004771 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Gauge fixing for sequence-function relationships

Affiliations

Gauge fixing for sequence-function relationships

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources