Predicting cell-type-specific gene expression from regions of open chromatin
- PMID: 22955983
- PMCID: PMC3431488
- DOI: 10.1101/gr.135129.111
Predicting cell-type-specific gene expression from regions of open chromatin
Abstract
Complex patterns of cell-type-specific gene expression are thought to be achieved by combinatorial binding of transcription factors (TFs) to sequence elements in regulatory regions. Predicting cell-type-specific expression in mammals has been hindered by the oftentimes unknown location of distal regulatory regions. To alleviate this bottleneck, we used DNase-seq data from 19 diverse human cell types to identify proximal and distal regulatory elements at genome-wide scale. Matched expression data allowed us to separate genes into classes of cell-type-specific up-regulated, down-regulated, and constitutively expressed genes. CG dinucleotide content and DNA accessibility in the promoters of these three classes of genes displayed substantial differences, highlighting the importance of including these aspects in modeling gene expression. We associated DNase I hypersensitive sites (DHSs) with genes, and trained classifiers for different expression patterns. TF sequence motif matches in DHSs provided a strong performance improvement in predicting gene expression over the typical baseline approach of using proximal promoter sequences. In particular, we achieved competitive performance when discriminating up-regulated genes from different cell types or genes up- and down-regulated under the same conditions. We identified previously known and new candidate cell-type-specific regulators. The models generated testable predictions of activating or repressive functions of regulators. DNase I footprints for these regulators were indicative of their direct binding to DNA. In summary, we successfully used information of open chromatin obtained by a single assay, DNase-seq, to address the problem of predicting cell-type-specific gene expression in mammalian organisms directly from regulatory sequence.
Figures






Similar articles
-
Genomic Footprinting Analyses from DNase-seq Data to Construct Gene Regulatory Networks.Methods Mol Biol. 2021;2328:25-46. doi: 10.1007/978-1-0716-1534-8_3. Methods Mol Biol. 2021. PMID: 34251618
-
The accessible chromatin landscape of the human genome.Nature. 2012 Sep 6;489(7414):75-82. doi: 10.1038/nature11232. Nature. 2012. PMID: 22955617 Free PMC article.
-
BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data.Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7. Bioinformatics. 2015. PMID: 25957350
-
Advances of DNase-seq for mapping active gene regulatory elements across the genome in animals.Gene. 2018 Aug 15;667:83-94. doi: 10.1016/j.gene.2018.05.033. Epub 2018 May 14. Gene. 2018. PMID: 29772251 Review.
-
Regulation of transcription factors via natural decoys in genomic DNA.Transcription. 2016 Aug 7;7(4):115-20. doi: 10.1080/21541264.2016.1188873. Epub 2016 Jul 6. Transcription. 2016. PMID: 27384377 Free PMC article. Review.
Cited by
-
Seq-ing answers: Current data integration approaches to uncover mechanisms of transcriptional regulation.Comput Struct Biotechnol J. 2020 May 31;18:1330-1341. doi: 10.1016/j.csbj.2020.05.018. eCollection 2020. Comput Struct Biotechnol J. 2020. PMID: 32612756 Free PMC article. Review.
-
Decoding the human genome.Genome Res. 2012 Sep;22(9):1599-601. doi: 10.1101/gr.146175.112. Genome Res. 2012. PMID: 22955971 Free PMC article. Review. No abstract available.
-
Human Transcriptome and Chromatin Modifications: An ENCODE Perspective.Genomics Inform. 2013 Jun;11(2):60-7. doi: 10.5808/GI.2013.11.2.60. Epub 2013 Jun 30. Genomics Inform. 2013. PMID: 23843771 Free PMC article.
-
Nonparametric single-cell multiomic characterization of trio relationships between transcription factors, target genes, and cis-regulatory regions.Cell Syst. 2022 Sep 21;13(9):737-751.e4. doi: 10.1016/j.cels.2022.08.004. Epub 2022 Sep 1. Cell Syst. 2022. PMID: 36055233 Free PMC article.
-
A Method for the Structure-Based, Genome-Wide Analysis of Bacterial Intergenic Sequences Identifies Shared Compositional and Functional Features.Genes (Basel). 2019 Oct 22;10(10):834. doi: 10.3390/genes10100834. Genes (Basel). 2019. PMID: 31652625 Free PMC article.
References
-
- Aerts S, Van Loo P, Thijs G, Moreau Y, De Moor B 2003. Computational detection of cis-regulatory modules. Bioinformatics (Suppl 2) 19: ii5–ii14 - PubMed
Publication types
MeSH terms
Substances
Associated data
- Actions
- Actions
- Actions
- Actions
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Miscellaneous