. 2025 Aug;43(8):1373-1383.

doi: 10.1038/s41587-024-02414-w. Epub 2024 Oct 11.

A community effort to optimize sequence-based deep learning models of gene regulation

Abdul Muntakim Rafi¹, Daria Nogina², Dmitry Penzar^{3

4

5}, Dohoon Lee⁶, Danyeong Lee⁶, Nayeon Kim⁶, Sangyeup Kim⁶, Dohyeon Kim⁶, Yeojin Shin⁶, Il-Youp Kwak⁷, Georgy Meshcheryakov⁵, Andrey Lando⁸, Arsenii Zinkevich^{2

3}, Byeong-Chan Kim⁷, Juhyun Lee⁷, Taein Kang⁷, Eeshit Dhaval Vaishnav^{9

10}, Payman Yadollahpour⁹; Random Promoter DREAM Challenge Consortium; Sun Kim⁶, Jake Albrecht¹¹, Aviv Regev^{9

12}, Wuming Gong¹³, Ivan V Kulakovskiy^{3

5}, Pablo Meyer¹⁴, Carl G de Boer¹⁵

Collaborators, Affiliations

Collaborators

Random Promoter DREAM Challenge Consortium:
Susanne Bornelöv, Fredrik Svensson, Maria-Anna Trapotsi, Duc Tran, Tin Nguyen, Xinming Tu, Wuwei Zhang, Wei Qiu, Rohan Ghotra, Yiyang Yu, Ethan Labelson, Aayush Prakash, Ashwin Narayanan, Peter Koo, Xiaoting Chen, David T Jones, Michele Tinti, Yuanfang Guan, Maolin Ding, Ken Chen, Yuedong Yang, Ke Ding, Gunjan Dixit, Jiayu Wen, Zhihan Zhou, Pratik Dutta, Rekha Sathian, Pallavi Surana, Yanrong Ji, Han Liu, Ramana V Davuluri, Yu Hiratsuka, Mao Takatsu, Tsai-Min Chen, Chih-Han Huang, Hsuan-Kai Wang, Edward S C Shih, Sz-Hau Chen, Chih-Hsun Wu, Jhih-Yu Chen, Kuei-Lin Huang, Ibrahim Alsaggaf, Patrick Greaves, Carl Barton, Cen Wan, Nicholas Abad, Cindy Körner, Lars Feuerbach, Benedikt Brors, Yichao Li, Sebastian Röner, Pyaree Mohan Dash, Max Schubach, Onuralp Soylemez, Andreas Møller, Gabija Kavaliauskaite, Jesper Madsen, Zhixiu Lu, Owen Queen, Ashley Babjac, Scott Emrich, Konstantinos Kardamiliotis, Konstantinos Kyriakidis, Andigoni Malousi, Ashok Palaniappan, Krishnakant Gupta, Prasanna Kumar S, Jake Bradford, Dimitri Perrin, Robert Salomone, Carl Schmitz, Chen JiaXing, Wang JingZhe, Yang AiWei

Affiliations

¹ University of British Columbia, Vancouver, British Columbia, Canada. rafi11@student.ubc.ca.
² Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia.
³ Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.
⁴ AIRI, Moscow, Russia.
⁵ Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia.
⁶ Seoul National University, Seoul, South Korea.
⁷ Chung-Ang University, Seoul, South Korea.
⁸ Yandex, Moscow, Russia.
⁹ Broad Institute of MIT and Harvard, Cambridge, MA, USA.
¹⁰ Sequome, Inc., South San Francisco, CA, USA.
¹¹ Sage Bionetworks, Seattle, WA, USA.
¹² Genentech, San Francisco, CA, USA.
¹³ University of Minnesota, Minneapolis, MN, USA.
¹⁴ Health Care and Life Sciences, IBM Research, New York, NY, USA.
¹⁵ University of British Columbia, Vancouver, British Columbia, Canada. carl.deboer@ubc.ca.

PMID: 39394483
PMCID: PMC12339383
DOI: 10.1038/s41587-024-02414-w

A community effort to optimize sequence-based deep learning models of gene regulation

Abdul Muntakim Rafi et al. Nat Biotechnol. 2025 Aug.

. 2025 Aug;43(8):1373-1383.

doi: 10.1038/s41587-024-02414-w. Epub 2024 Oct 11.

Authors

Collaborators

Random Promoter DREAM Challenge Consortium:
Susanne Bornelöv, Fredrik Svensson, Maria-Anna Trapotsi, Duc Tran, Tin Nguyen, Xinming Tu, Wuwei Zhang, Wei Qiu, Rohan Ghotra, Yiyang Yu, Ethan Labelson, Aayush Prakash, Ashwin Narayanan, Peter Koo, Xiaoting Chen, David T Jones, Michele Tinti, Yuanfang Guan, Maolin Ding, Ken Chen, Yuedong Yang, Ke Ding, Gunjan Dixit, Jiayu Wen, Zhihan Zhou, Pratik Dutta, Rekha Sathian, Pallavi Surana, Yanrong Ji, Han Liu, Ramana V Davuluri, Yu Hiratsuka, Mao Takatsu, Tsai-Min Chen, Chih-Han Huang, Hsuan-Kai Wang, Edward S C Shih, Sz-Hau Chen, Chih-Hsun Wu, Jhih-Yu Chen, Kuei-Lin Huang, Ibrahim Alsaggaf, Patrick Greaves, Carl Barton, Cen Wan, Nicholas Abad, Cindy Körner, Lars Feuerbach, Benedikt Brors, Yichao Li, Sebastian Röner, Pyaree Mohan Dash, Max Schubach, Onuralp Soylemez, Andreas Møller, Gabija Kavaliauskaite, Jesper Madsen, Zhixiu Lu, Owen Queen, Ashley Babjac, Scott Emrich, Konstantinos Kardamiliotis, Konstantinos Kyriakidis, Andigoni Malousi, Ashok Palaniappan, Krishnakant Gupta, Prasanna Kumar S, Jake Bradford, Dimitri Perrin, Robert Salomone, Carl Schmitz, Chen JiaXing, Wang JingZhe, Yang AiWei

Affiliations

¹ University of British Columbia, Vancouver, British Columbia, Canada. rafi11@student.ubc.ca.
² Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia.
³ Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.
⁴ AIRI, Moscow, Russia.
⁵ Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia.
⁶ Seoul National University, Seoul, South Korea.
⁷ Chung-Ang University, Seoul, South Korea.
⁸ Yandex, Moscow, Russia.
⁹ Broad Institute of MIT and Harvard, Cambridge, MA, USA.
¹⁰ Sequome, Inc., South San Francisco, CA, USA.
¹¹ Sage Bionetworks, Seattle, WA, USA.
¹² Genentech, San Francisco, CA, USA.
¹³ University of Minnesota, Minneapolis, MN, USA.
¹⁴ Health Care and Life Sciences, IBM Research, New York, NY, USA.
¹⁵ University of British Columbia, Vancouver, British Columbia, Canada. carl.deboer@ubc.ca.

PMID: 39394483
PMCID: PMC12339383
DOI: 10.1038/s41587-024-02414-w

Abstract

A systematic evaluation of how model architectures and training strategies impact genomics model performance is needed. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. All top-performing models used neural networks but diverged in architectures and training strategies. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide models into modular building blocks. We tested all possible combinations for the top three models, further improving their performance. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets, demonstrating the progress that can be driven by gold-standard genomics datasets.

PubMed Disclaimer

Conflict of interest statement

Competing interests: E.D.V. is the founder of Sequome, Inc. A.R. is an employee of Genentech and has equity in Roche. A.R. is a cofounder and equity holder of Celsius Therapeutics, an equity holder in Immunitas and, until July 31, 2020, was a scientific advisory board member of Thermo Fisher Scientific, Syros Pharmaceuticals, Neogene Therapeutics and Asimov. A.R. was an Investigator of the Howard Hughes Medical Institute when this work was initiated. The remaining authors declare no competing interests.

Figures

**Fig. 1. Overview of the challenge.**
a, Left, competitors received a training dataset of random promoters and corresponding expression values. Middle, they continually refined their models and competed for dominance in a public leaderboard. Right, at the end of the challenge, they submitted a final model for evaluation using a test dataset consisting of eight sequence types: (i) high expression, (ii) low expression, (iii) native, (iv) random, (v) challenging, (vi) SNVs, (vii) motif perturbation and (viii) motif tiling. b,c, Bootstrapping provides a robust comparison of the model predictions. Distribution of ranks in n = 10,000 samples from the test dataset (y axes) for the top-performing teams (x axes) Pearson score (b) and Spearman score (c). d,e, Performance of the top-performing teams in each test data subset. Model performance (color and numerical values) of each team (y axes) in each test subset (x axes) for Pearson’s r² (d) and Spearman’s ρ (e). Heat map color palettes are min–max-normalized by column. f,g, Performance disparities observed between the best and worst models (x axes) in different test subsets (y axes) for Pearson’s r² (f) and Spearman’s ρ (g). The calculation of the percentage difference is relative to the best model performance for each test subset. Source data

**Fig. 2. Dissecting the optimal model configurations through a Prix Fixe framework.**
a, The framework deconstructs each team’s solution into modules, enabling modules from different solutions to be combined. b, Performance in Pearson score from the Prix Fixe runs for all combinations of modules from the top three DREAM Challenge solutions. Each cell represents the performance obtained from a unique combination of core layer block (major rows, left), data processor and trainer (major columns, top), first layer block (minor rows, right) and final layer block (minor columns, bottom) modules. Gray cells denote combinations that were either incompatible or did not converge during training. c, Performance (Pearson score, y axis) of the three data processor and trainer modules (x axis and colors) for each Prix Fixe model including the respective module (individual points). Original model combinations are indicated by white points, while all other combinations are in black. d, Number of parameters (x axis) for the top three DREAM Challenge models (Autosome.org, BHI and UnlockDNA) along with their best-performing counterparts (based on core layer block), DREAM-CNN, DREAM-RNN and DREAM-Attn, in the Prix Fixe runs (y axis). e, As in d, but showing each model’s Pearson score (x axis). Source data

**Fig. 3. DREAM Challenge models beat existing benchmarks on *Drosophila* and human datasets.**
a, D. *melanogaster* STARR-seq prediction. Pearson’s correlation for predicted versus actual enhancer activity for held-out data (y axis) for two different transcriptional programs (x axis) for each model (colors). b, Human MPRA prediction. Pearson correlation for predicted versus actual expression for held-out data (y axis) for MPRA datasets from three distinct human cell types (x axis) for each model (colors). c,d, Human accessibility (bulk K562 ATAC-seq)^, prediction. For each model (x axis and colors), model performance (y axes) is shown in terms of both Pearson’s correlation for predicted versus actual read counts per element (c) and 1 − median Jensen–Shannon distance for predicted versus actual chromatin accessibility profiles across each element (d). In a–d, points represent folds of cross-validation, performance is evaluated on held-out test data and P values determined by t-tests (paired, two-sided) comparing the previous state-of-the-art model to the optimized models are shown above the model performance distributions. e, Comparison of the number of parameters (x axis) for different models used in chromatin accessibility prediction task. Source data

**Extended Data Fig. 1. Progression of the top-performing teams’ performance in the DREAM Challenge public leaderboard.**
**(A,B)** Performance (y-axes) in (A) Pearson Score and (B) Spearman Score achieved by the participating teams (colours) each week (x-axes), showcasing the effectiveness of such challenges in motivating the development of better machine learning models. Source data

**Extended Data Fig. 2. Bootstrapping provides a robust comparison of the model predictions.**
**(A,B,C)** Frequency (y-axes) of ranks (x-axes) in (A) Pearson Score, (B) Spearman Score and combined rank (sum of Pearson Score rank and Spearman Score rank) for $n$ =10,000 samples from the test dataset for the top-performing teams. Source data

**Extended Data Fig. 3. Library coverage differs between sequence subsets and is lowest for native sequences.**
Cumulative proportion (y-axis) of the number of reads per sequence (x-axis) for different sequence types (colours). Source data

**Extended Data Fig. 4. Performance of the teams in each test data subset.**
(**A,B**) Model performance (colour and numerical values) of each team (y-axes) in each test subset (x-axes), for (A) Pearson r² and (B) Spearman ρ. Heatmap colour palettes are min-max normalized column-wise. Source data

**Extended Data Fig. 5. Expression changes in response to SNVs, motif tiling, and motif perturbation.**
Expression changes (y-axis) are biggest for motif perturbation, smallest for SNVs, and intermediate for motif tiling. Source data

**Extended Data Fig. 6. Performance in Spearman Score from the Prix Fixe runs for different possible combinations of the top three DREAM Challenge models.**
Modules are indicated on the axes, with Data Processor and Trainer models on the top x-axis, Final Layer Block on the bottom x-axis, Core Layers Block on the left y-axis, and First Layers Block on the right y-axis. Incompatible combinations and combinations that did not converge during training have been greyed out. Source data

**Extended Data Fig. 7. Performance comparison of top DREAM Challenge models and their best performing counterparts.**
Performance (x-axes) of the top three DREAM Challenge models (y-axes) Autosome.org, BHI, and UnlockDNA - along with their best-performing counterparts (based on Core Layers Block) for different test subsets. Source data

**Extended Data Fig. 8. The DREAM-optimized models learn a very similar view of yeast cis-regulatory logic.**
**(A-E)** ISM scores (y-axes) for each nucleotide across each promoter region (x-axes and letters) for wild type (left) and SNV mutant (right) for yeast promoters (A) YOL101C, (B) YBL057C, (C) YGL075C, (D) YDR248C, and (E) YCL051W. Mutation locations are highlighted in grey. Probable transcription factor binding sites (**TFBSs**) altered by these mutations are marked with boxes, and the corresponding TF motifs are shown in the insets, identified using YeTFaSCo. Source data

**Extended Data Fig. 9. NN architecture diagrams of the DREAM-optimized models.**
**(A-C)** High level illustration of the (A) DREAM-RNN, (B) DREAM-CNN, and (C) DREAM-Attn models. (**D-F**) High level illustration of different network blocks used within the core layers of (**A-C**). The Vanilla Conv Block, Grouped Conv Block, SE Block, Stem Block, FFN Block, SeparableConv Block, and Multi-head attention Block are described in detail in and.

**Extended Data Fig. 10. Comparative analysis of computational efficiency (per batch training time and throughput) and capacity (number of parameters) across different models.**
**(A, D, G)** Per batch training time in seconds (y-axes), **(B,E,H)** throughput in predictions per second (y-axes), and **(C, F, I)** number of parameters (y-axes) for the models (x-axes, colours) applied to (**A-C**) human ATAC-seq, (**D-F**) Drosophila STARR-seq, and (**G-I**) human MPRA dataset. Boxplots represent the distribution of measurements for training time per batch (A, D, G) and throughput (B, E, H), which were repeated 50 times to ensure reliability. Source data

See this image and copyright information in PMC

Update of

Evaluation and optimization of sequence-based gene regulatory deep learning models.
Rafi AM, Nogina D, Penzar D, Lee D, Lee D, Kim N, Kim S, Kim D, Shin Y, Kwak IY, Meshcheryakov G, Lando A, Zinkevich A, Kim BC, Lee J, Kang T, Vaishnav ED, Yadollahpour P; Random Promoter DREAM Challenge Consortium; Kim S, Albrecht J, Regev A, Gong W, Kulakovskiy IV, Meyer P, de Boer C. Rafi AM, et al. bioRxiv [Preprint]. 2024 Feb 17:2023.04.26.538471. doi: 10.1101/2023.04.26.538471. bioRxiv. 2024. Update in: Nat Biotechnol. 2025 Aug;43(8):1373-1383. doi: 10.1038/s41587-024-02414-w. PMID: 38405704 Free PMC article. Updated. Preprint.

References

1. Phillips, T. Regulation of transcription and gene expression in eukaryotes. Nat. Educ.1, 199 (2008).
1. Roeder, R. G. 50+ years of eukaryotic transcription: an expanding universe of factors and mechanisms. Nat. Struct. Mol. Biol.26, 783–791 (2019). - PMC - PubMed
1. Cramer, P. Organization and regulation of gene transcription. Nature573, 45–54 (2019). - PubMed
1. Furlong, E. E. M. & Levine, M. Developmental enhancers and chromosome topology. Science361, 1341–1345 (2018). - PMC - PubMed
1. Field, A. & Adelman, K. Evaluating enhancer function and transcription. Annu. Rev. Biochem.89, 213–234 (2020). - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

K99 HG009920/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Molecular Biology Databases
- FlyBase
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A community effort to optimize sequence-based deep learning models of gene regulation

Collaborators

Affiliations

A community effort to optimize sequence-based deep learning models of gene regulation

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous