Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks
- PMID: 32433972
- DOI: 10.1016/j.celrep.2020.107663
Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks
Abstract
Algorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here, we sought to apply deep convolutional neural networks toward that goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, termed Xpresso, more than doubles the accuracy of alternative sequence-based models and isolates rules as predictive as models relying on chromatic immunoprecipitation sequencing (ChIP-seq) data. Xpresso recapitulates genome-wide patterns of transcriptional activity, and its residuals can be used to quantify the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose cell-type-specific gene-expression predictions based solely on primary sequences as a grand challenge for the field.
Keywords: deep learning; gene regulation; predicting gene expression.
Copyright © 2020 The Authors. Published by Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of Interests The authors declare no competing interests.
Comment in
-
Predicting mRNA levels from genome sequence.Nat Rev Genet. 2020 Aug;21(8):446-447. doi: 10.1038/s41576-020-0253-9. Nat Rev Genet. 2020. PMID: 32467606 No abstract available.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
