Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Aug 13:2025.08.12.669990.
doi: 10.1101/2025.08.12.669990.

Toehold-VISTA: A machine learning approach to decipher programmable RNA sensor-target interactions

Affiliations

Toehold-VISTA: A machine learning approach to decipher programmable RNA sensor-target interactions

James M Robson et al. bioRxiv. .

Abstract

RNA-based biosensors have emerged as essential tools in synthetic biology and diagnostics, enabling precise and programmable responses to diverse RNA inputs. However, the time to design, produce, and screen high-performance RNA sensors remains a critical challenge. The fundamental rules governing RNA-RNA interactions-specifically the structure-function relationships that determine sensor performance-remain poorly understood. Here, we present a method enabling versatile in-silico RNA-targeting analysis (VISTA), a machine learning-guided framework for the rapid design of RNA sensors. VISTA integrates biophysical modeling of both sensor and target RNAs with a partial least squares discriminant analysis (PLS-DA) machine learning framework. Using high-throughput experimental measurements with sequence-structure feature extraction to train predictive models, we capture the key determinants of RNA sensor performance. We find that by using toehold switches as a model RNA sensor, Toehold-VISTA successfully designs RNA sensors with improved function against SARS-CoV-2 RNA. These findings establish a broadly applicable, target-aware design strategy for accelerating RNA sensor engineering across biotechnology and diagnostic applications.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTEREST A.A.G. is a co-founder of En Carta Diagnostics, Inc. and Gardn Biosciences. J.M.R. declares no competing interests.

Figures

Figure 1.
Figure 1.. Screening of toehold switches across the mCherry RNA transcript.
(A) Schematic of the toehold switch library design, in which switches were tiled at 3-nucleotide intervals along the mCherry transcript using a conserved hairpin scaffold sequence and varying 36-nt binding domains. (B) Arc diagram of pairwise probabilities with most probable (top) and least probable (bottom) interactions plotted by increasing nucleotide index from left to right. Heat map of OFF-state, ON-truncated, and ON-full RNA GFP fluorescence and fold change data are aligned to each respective index. (C) Quantification of ON state GFP fluorescence for truncated and full-length RNA transcripts (p < 0.0001, two-tailed t test). (D) Fold-change distribution for truncated and full-length activation of toehold switches (p < 0.0001, two-tailed t test). For violin plots, the horizontal dashed line represents the median and dotted lines are at 25th and 75th percentiles.
Figure 2.
Figure 2.. Rational feature determinants of toehold switch performance.
(A) The Pearson correlation (max(|r|) = 0.5) between 33 thermodynamic and structural features of toehold switches, RBS calculator v2.1 outputs, and experimentally determined values. (B) OFF-state GFP fluorescence stratified by lowest and highest 10% (n=19) NUPACK predicted OFF-state-GFP MFE (p < 0.0001 two-tailed t test). (C) Full RNA target fold change stratified by lowest and highest 10% (n = 19) NUPACK predicted OFF-state-GFP MFE (p = 0.0007 two-tailed t test). (D) OFF-state GFP fluorescence stratified by lowest and highest 10% (n=19) NUPACK calculated ideal ensemble defect for toehold-linker structure. (E) Sequence logos for all 36-nt RNA target (top) and for the first 6 nt at the base of the 3’ end of the toehold switch hairpin stem discovered to be disproportionately represented in various functional groups. For all bar plots, horizontal line indicates the median and whiskers indicate the standard deviation (SD).
Figure 3.
Figure 3.. Influence of target RNA structure and context on toehold switch activation.
(A) Schematic of contextual modelling used to compute rational metrics for the 36-nt sensor binding site and surrounding sequences with variable flanking lengths. (B) Pearson correlation (max(|r|) = 0.43) across 22 thermodynamic and structural features of the target RNA and experimentally determined values. (C) ON-state GFP fluorescence stratified by the lowest and highest 10% (n=19) average pair probabilities for the 36-nt target window (p < 0.0001, two-tailed t test). (D) ON-state GFP fluorescence stratified by the lowest and highest 20% (n=38) minimum free energy (MFE) for ±0 or ±100 nt flanks (p = 0.0003 and p < 0.0001, respectively; two-tailed t test). (E) Percent change in measured ON-state GFP fluorescence between truncated target and each full-length RNA target flanking length, stratified by the lowest and highest 10% (n=19) MFE (two-tailed t tests). (F, G) Percent change in measured ON-state GFP fluorescence between truncated and full-length target with no flank (F) or 100-nt flanks (G), stratified by the lowest and highest 20% (n=38) ideal ensemble defect (p = 0.0355, ns two-tailed t test). Bars in (C, D) represent the median and whiskers represent standard deviation (SD). Lines in (E, F, G) represent the median.
Figure 4.
Figure 4.. Codon fraction modulates toehold switch activation.
(A) Pearson correlation (max(|r|) = 0.3) between 18 codon fraction calculations and experimentally determined values. Comparison between switches with highest and lowest 10% (n=19) codon usage in the variable two codons at the base of the hairpin stem for truncated target (B) and full-length RNA target (C) (p = 0.0213 and p = 0.0322, respectively; two-tailed t test). (D) Full RNA fold change stratified by the lowest and highest 10% (n=19) average codon fraction for the 7 codons following toehold switch target sites (p = 0.0162, two-tailed t test). Lines in (B, C) represent the median. Bars in (D) represent the median and whiskers represent standard deviation (SD).
Figure 5.
Figure 5.. Partial least squares discriminant analysis (PLS-DA) enables robust prediction of toehold switch function.
(A) Top 20 predictive features were selected from 66 rational parameter inputs via recursive feature elimination. (B) Cross-validation accuracy for selection of the optimal number of components for PLS-DA models across experimental outputs. (C) Receiver operating characteristic (ROC) curves for each model. (D) Comparison of ON/OFF fold change for the top 20% of switches (n=19) ranked by PLS-DA for full RNA and truncated targets, and tsgen2 rankings. Bars represent median and whiskers represent SD. (E) Experimental ON/OFF fold change values for full RNA target and PLS-DA algorithm rank. Linear regression is plotted with 95% confidence interval. (F) PLS-DA model predictions correlations with experimentally determined values. Bars represent median and whiskers represent SD.
Figure 6.
Figure 6.. Toehold-VISTA enables accurate functional prediction for a new RNA target.
(A) Schematic of the VISTA pipeline: all possible toehold switch candidates are designed with NUPACK and rational parameters are calculated. The top 20 thermodynamic and structural parameters are chosen and the PLS loadings are used to transform data into latent variables. Logistic regression coefficients are finally used to compute probabilities that each switch belongs in the “high performance” class, before returning a ranked output of all possible toehold switches. (B) ON/OFF fold-change data for experimental validation of the top 12 and bottom 12 VISTA-scored toehold switches and top 12 tsgen2-designed toehold switches (comparisons made by one-way ANOVA). (C) ON GFP fluorescence data for 12 VISTA-scored toehold switches for high ON-state fluorescence, all other VISTA-designed constructs (n=36) and top 12 tsgen2 designed switches. (D) OFF GFP fluorescence data for 12 VISTA-scored toehold switches for low OFF-state fluorescence, all other VISTA designed constructs (n=36), and top 12 tsgen2-designed switches. Comparisons in (C, D) made with Dunnett’s T3 multiple comparisons test. Bars in (B, C, D) represent the median and whiskers indicate the SD.

References

    1. Chappell J., Watters K.E., Takahashi M.K. and Lucks J.B. (2015) A renaissance in RNA synthetic biology: new mechanisms, applications and tools for the future. Current Opinion in Chemical Biology, 28, 47–56. - PubMed
    1. Green A.A., Silver P.A., Collins J.J. and Yin P. (2014) Toehold Switches: De-Novo-Designed Regulators of Gene Expression. Cell, 159, 925–939. - PMC - PubMed
    1. Chappell J., Takahashi M.K. and Lucks J.B. (2015) Creating small transcription activating RNAs. Nat Chem Biol, 11, 214–220. - PubMed
    1. Kim C.M. and Smolke C.D. (2017) Biomedical Applications of RNA-Based Devices. Curr Opin Biomed Eng, 4, 106–115. - PMC - PubMed
    1. Kaseniit K.E., Katz N., Kolber N.S., Call C.C., Wengier D.L., Cody W.B., Sattely E.S. and Gao X.J. (2023) Modular, programmable RNA sensing using ADAR editing in living cells. Nat Biotechnol, 41, 482–487. - PMC - PubMed

Publication types