This is a preprint.
GET: a foundation model of transcription across human cell types
- PMID: 39005360
- PMCID: PMC11244937
- DOI: 10.1101/2023.09.24.559168
GET: a foundation model of transcription across human cell types
Update in
-
A foundation model of transcription across human cell types.Nature. 2025 Jan;637(8047):965-973. doi: 10.1038/s41586-024-08391-z. Epub 2025 Jan 8. Nature. 2025. PMID: 39779852 Free PMC article.
Abstract
Transcriptional regulation, involving the complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate in unseen cell types and conditions. Here, we introduce GET, an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types. Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types. GET showcases remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovering universal and cell type specific transcription factor interaction networks. We evaluated its performance on prediction of regulatory activity, inference of regulatory elements and regulators, and identification of physical interactions between transcription factors. Specifically, we show GET outperforms current models in predicting lentivirus-based massive parallel reporter assay readout with reduced input data. In fetal erythroblasts, we identify distal (>1Mbp) regulatory regions that were missed by previous models. In B cells, we identified a lymphocyte-specific transcription factor-transcription factor interaction that explains the functional significance of a leukemia-risk predisposing germline mutation. In sum, we provide a generalizable and accurate model for transcription together with catalogs of gene regulation and transcription factor interactions, all with cell type specificity.
Conflict of interest statement
Disclosure of potential conflicts of interest A US provisional patent with application number 63/486,855 has been filed by Columbia University on using the method developed in the manuscript to identify gene regulatory elements and altering gene regulation and expression, on which X.F. and R.R. are inventors. R.R. is a founder of Genotwin and a member of the SAB of DiaTech and Flahy. None of these activities are related to the work described in this manuscript.
Figures
References
-
- Elkon R. & Agami R. Characterization of noncoding regulatory DNA in the human genome. Nat Biotechnol 35, 732–746 (2017). - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources