Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;43(8):1373-1383.
doi: 10.1038/s41587-024-02414-w. Epub 2024 Oct 11.

A community effort to optimize sequence-based deep learning models of gene regulation

Collaborators, Affiliations

A community effort to optimize sequence-based deep learning models of gene regulation

Abdul Muntakim Rafi et al. Nat Biotechnol. 2025 Aug.

Abstract

A systematic evaluation of how model architectures and training strategies impact genomics model performance is needed. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. All top-performing models used neural networks but diverged in architectures and training strategies. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide models into modular building blocks. We tested all possible combinations for the top three models, further improving their performance. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets, demonstrating the progress that can be driven by gold-standard genomics datasets.

PubMed Disclaimer

Conflict of interest statement

Competing interests: E.D.V. is the founder of Sequome, Inc. A.R. is an employee of Genentech and has equity in Roche. A.R. is a cofounder and equity holder of Celsius Therapeutics, an equity holder in Immunitas and, until July 31, 2020, was a scientific advisory board member of Thermo Fisher Scientific, Syros Pharmaceuticals, Neogene Therapeutics and Asimov. A.R. was an Investigator of the Howard Hughes Medical Institute when this work was initiated. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the challenge.
a, Left, competitors received a training dataset of random promoters and corresponding expression values. Middle, they continually refined their models and competed for dominance in a public leaderboard. Right, at the end of the challenge, they submitted a final model for evaluation using a test dataset consisting of eight sequence types: (i) high expression, (ii) low expression, (iii) native, (iv) random, (v) challenging, (vi) SNVs, (vii) motif perturbation and (viii) motif tiling. b,c, Bootstrapping provides a robust comparison of the model predictions. Distribution of ranks in n = 10,000 samples from the test dataset (y axes) for the top-performing teams (x axes) Pearson score (b) and Spearman score (c). d,e, Performance of the top-performing teams in each test data subset. Model performance (color and numerical values) of each team (y axes) in each test subset (x axes) for Pearson’s r2 (d) and Spearman’s ρ (e). Heat map color palettes are min–max-normalized by column. f,g, Performance disparities observed between the best and worst models (x axes) in different test subsets (y axes) for Pearson’s r2 (f) and Spearman’s ρ (g). The calculation of the percentage difference is relative to the best model performance for each test subset. Source data
Fig. 2
Fig. 2. Dissecting the optimal model configurations through a Prix Fixe framework.
a, The framework deconstructs each team’s solution into modules, enabling modules from different solutions to be combined. b, Performance in Pearson score from the Prix Fixe runs for all combinations of modules from the top three DREAM Challenge solutions. Each cell represents the performance obtained from a unique combination of core layer block (major rows, left), data processor and trainer (major columns, top), first layer block (minor rows, right) and final layer block (minor columns, bottom) modules. Gray cells denote combinations that were either incompatible or did not converge during training. c, Performance (Pearson score, y axis) of the three data processor and trainer modules (x axis and colors) for each Prix Fixe model including the respective module (individual points). Original model combinations are indicated by white points, while all other combinations are in black. d, Number of parameters (x axis) for the top three DREAM Challenge models (Autosome.org, BHI and UnlockDNA) along with their best-performing counterparts (based on core layer block), DREAM-CNN, DREAM-RNN and DREAM-Attn, in the Prix Fixe runs (y axis). e, As in d, but showing each model’s Pearson score (x axis). Source data
Fig. 3
Fig. 3. DREAM Challenge models beat existing benchmarks on Drosophila and human datasets.
a, D.melanogaster STARR-seq prediction. Pearson’s correlation for predicted versus actual enhancer activity for held-out data (y axis) for two different transcriptional programs (x axis) for each model (colors). b, Human MPRA prediction. Pearson correlation for predicted versus actual expression for held-out data (y axis) for MPRA datasets from three distinct human cell types (x axis) for each model (colors). c,d, Human accessibility (bulk K562 ATAC-seq), prediction. For each model (x axis and colors), model performance (y axes) is shown in terms of both Pearson’s correlation for predicted versus actual read counts per element (c) and 1 − median Jensen–Shannon distance for predicted versus actual chromatin accessibility profiles across each element (d). In ad, points represent folds of cross-validation, performance is evaluated on held-out test data and P values determined by t-tests (paired, two-sided) comparing the previous state-of-the-art model to the optimized models are shown above the model performance distributions. e, Comparison of the number of parameters (x axis) for different models used in chromatin accessibility prediction task. Source data
Extended Data Fig. 1
Extended Data Fig. 1. Progression of the top-performing teams’ performance in the DREAM Challenge public leaderboard.
(A,B) Performance (y-axes) in (A) Pearson Score and (B) Spearman Score achieved by the participating teams (colours) each week (x-axes), showcasing the effectiveness of such challenges in motivating the development of better machine learning models. Source data
Extended Data Fig. 2
Extended Data Fig. 2. Bootstrapping provides a robust comparison of the model predictions.
(A,B,C) Frequency (y-axes) of ranks (x-axes) in (A) Pearson Score, (B) Spearman Score and combined rank (sum of Pearson Score rank and Spearman Score rank) for n=10,000 samples from the test dataset for the top-performing teams. Source data
Extended Data Fig. 3
Extended Data Fig. 3. Library coverage differs between sequence subsets and is lowest for native sequences.
Cumulative proportion (y-axis) of the number of reads per sequence (x-axis) for different sequence types (colours). Source data
Extended Data Fig. 4
Extended Data Fig. 4. Performance of the teams in each test data subset.
(A,B) Model performance (colour and numerical values) of each team (y-axes) in each test subset (x-axes), for (A) Pearson r2 and (B) Spearman ρ. Heatmap colour palettes are min-max normalized column-wise. Source data
Extended Data Fig. 5
Extended Data Fig. 5. Expression changes in response to SNVs, motif tiling, and motif perturbation.
Expression changes (y-axis) are biggest for motif perturbation, smallest for SNVs, and intermediate for motif tiling. Source data
Extended Data Fig. 6
Extended Data Fig. 6. Performance in Spearman Score from the Prix Fixe runs for different possible combinations of the top three DREAM Challenge models.
Modules are indicated on the axes, with Data Processor and Trainer models on the top x-axis, Final Layer Block on the bottom x-axis, Core Layers Block on the left y-axis, and First Layers Block on the right y-axis. Incompatible combinations and combinations that did not converge during training have been greyed out. Source data
Extended Data Fig. 7
Extended Data Fig. 7. Performance comparison of top DREAM Challenge models and their best performing counterparts.
Performance (x-axes) of the top three DREAM Challenge models (y-axes) Autosome.org, BHI, and UnlockDNA - along with their best-performing counterparts (based on Core Layers Block) for different test subsets. Source data
Extended Data Fig. 8
Extended Data Fig. 8. The DREAM-optimized models learn a very similar view of yeast cis-regulatory logic.
(A-E) ISM scores (y-axes) for each nucleotide across each promoter region (x-axes and letters) for wild type (left) and SNV mutant (right) for yeast promoters (A) YOL101C, (B) YBL057C, (C) YGL075C, (D) YDR248C, and (E) YCL051W. Mutation locations are highlighted in grey. Probable transcription factor binding sites (TFBSs) altered by these mutations are marked with boxes, and the corresponding TF motifs are shown in the insets, identified using YeTFaSCo. Source data
Extended Data Fig. 9
Extended Data Fig. 9. NN architecture diagrams of the DREAM-optimized models.
(A-C) High level illustration of the (A) DREAM-RNN, (B) DREAM-CNN, and (C) DREAM-Attn models. (D-F) High level illustration of different network blocks used within the core layers of (A-C). The Vanilla Conv Block, Grouped Conv Block, SE Block, Stem Block, FFN Block, SeparableConv Block, and Multi-head attention Block are described in detail in and.
Extended Data Fig. 10
Extended Data Fig. 10. Comparative analysis of computational efficiency (per batch training time and throughput) and capacity (number of parameters) across different models.
(A, D, G) Per batch training time in seconds (y-axes), (B,E,H) throughput in predictions per second (y-axes), and (C, F, I) number of parameters (y-axes) for the models (x-axes, colours) applied to (A-C) human ATAC-seq, (D-F) Drosophila STARR-seq, and (G-I) human MPRA dataset. Boxplots represent the distribution of measurements for training time per batch (A, D, G) and throughput (B, E, H), which were repeated 50 times to ensure reliability. Source data

Update of

References

    1. Phillips, T. Regulation of transcription and gene expression in eukaryotes. Nat. Educ.1, 199 (2008).
    1. Roeder, R. G. 50+ years of eukaryotic transcription: an expanding universe of factors and mechanisms. Nat. Struct. Mol. Biol.26, 783–791 (2019). - PMC - PubMed
    1. Cramer, P. Organization and regulation of gene transcription. Nature573, 45–54 (2019). - PubMed
    1. Furlong, E. E. M. & Levine, M. Developmental enhancers and chromosome topology. Science361, 1341–1345 (2018). - PMC - PubMed
    1. Field, A. & Adelman, K. Evaluating enhancer function and transcription. Annu. Rev. Biochem.89, 213–234 (2020). - PubMed