Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Aug 1:arXiv:2504.19034v3.

On learning functions over biological sequence space: relating Gaussian process priors, regularization, and gauge fixing

Affiliations

On learning functions over biological sequence space: relating Gaussian process priors, regularization, and gauge fixing

Samantha Petti et al. ArXiv. .

Abstract

Mappings from biological sequences (DNA, RNA, protein) to quantitative measures of sequence functionality play an important role in contemporary biology. We are interested in the related tasks of (i) inferring predictive sequence-to-function maps and (ii) decomposing sequence-function maps to elucidate the contributions of individual subsequences. Because each sequence-function map can be written as a weighted sum over subsequences in multiple ways, meaningfully interpreting these weights requires "gauge-fixing," i.e., defining a unique representation for each map. Recent work has established that most existing gauge-fixed representations arise as the unique solutions to L 2 -regularized regression in an overparameterized "weight space" where the choice of regularizer defines the gauge. Here, we establish the relationship between regularized regression in overparameterized weight space and Gaussian process approaches that operate in "function space," i.e. the space of all real-valued functions on a finite set of sequences. We disentangle how weight space regularizers both impose an implicit prior on the learned function and restrict the optimal weights to a particular gauge. We show how to construct regularizers that correspond to arbitrary explicit Gaussian process priors combined with a wide variety of gauges and characterize the implicit function space priors associated with the most common weight space regularizers. Finally, we derive the posterior distribution of a broad class of sequence-to-function statistics, including gauge-fixed weights and multiple systems for expressing higher-order epistatic coefficients. We show that such distributions can be efficiently computed for product-kernel priors using a kernel trick.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
An illustration of the equivalences established. Let A and B be matrices such that nullspace(A)=G and nullspace(B)=Θ.

Similar articles

References

    1. Amin Alan Nawzad, Weinstein Eli Nathan, and Susan Marks Debora. Biological sequence kernels with guaranteed flexibility, April 2023. arXiv:2304.03775 [cs, q-bio, stat].
    1. Nguyen Ba Alex N, Lawrence Katherine R, Rego-Costa Artur, Gopalakrishnan Shreyas, Temko Daniel, Michor Franziska, and Desai Michael M. Barcoded bulk QTL mapping reveals highly polygenic and epistatic architecture of complex traits in yeast. Elife, 11:e73983, 2022. - PMC - PubMed
    1. Bank Claudia. Epistasis and adaptation on fitness landscapes. Annual Review of Ecology, Evolution, and Systematics, 53(1):457–479, 2022.
    1. Beerenwinkel Niko, Pachter Lior, and Sturmfels Bernd. Epistasis and shapes of fitness landscapes. Statistica Sinica, pages 1317–1342, 2007.
    1. Beerenwinkel Niko, Pachter Lior, Sturmfels Bernd, Elena Santiago F, and Lenski Richard E. Analysis of epistatic interactions and fitness landscapes using a new geometric approach. BMC Evolutionary Biology, 7:1–12, 2007. - PMC - PubMed

Publication types

LinkOut - more resources