Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 24;13(1):e1005311.
doi: 10.1371/journal.pcbi.1005311. eCollection 2017 Jan.

Genome-Wide Association between Transcription Factor Expression and Chromatin Accessibility Reveals Regulators of Chromatin Accessibility

Affiliations

Genome-Wide Association between Transcription Factor Expression and Chromatin Accessibility Reveals Regulators of Chromatin Accessibility

David Lamparter et al. PLoS Comput Biol. .

Abstract

To better understand genome regulation, it is important to uncover the role of transcription factors in the process of chromatin structure establishment and maintenance. Here we present a data-driven approach to systematically characterise transcription factors that are relevant for this process. Our method uses a linear mixed modelling approach to combine datasets of transcription factor binding motif enrichments in open chromatin and gene expression across the same set of cell lines. Applying this approach to the ENCODE dataset, we confirm already known and imply numerous novel transcription factors that play a role in the establishment or maintenance of open chromatin. In particular, our approach rediscovers many factors that have been annotated as pioneer factors.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Mixed model approach for identification of chromatin accessibility regulators.
For a TF binding motif, we search for all its instances in the genome. For each cell line, we calculate the accessibility score by counting how many motif instances are found in the open chromatin fraction of the genome. After further normalization, these accessibility scores are compared to gene expression values for all genes via regression (Methods). To account for confounding, we use mixed model regression, where an additional random component is used with the same covariance structure as the gene expression matrix. To be considered a CAR candidate, motif accessibility of a TF must show strong association (low p-value) with the expression of the corresponding TF gene compared to other genes. The gene-level CAR rank of a TF is defined as the rank of its association p-value among the p-values for all genes.
Fig 2
Fig 2. Association between motif accessibility and mRNA expression for the putative chromatin accessibility regulator EBF1.
Three different regression models (a-c) were used to compute association p-values between the accessibility of a given TF motif (here EBF1) and mRNA expression for each of the assayed 15K protein-coding genes. Results are visualized as qq-plots showing the -log10 transformed p-values. (a) Association p-values obtained using standard linear regression. Due to confounding, p-values are strongly inflated and EBF1 motif accessibility does not show strong association with EBF1 expression compared to other genes. (b) The linear mixed model (LMM) successfully corrects for confounding, with most p-values following the null distribution as expected. The association between EBF1 motif accessibility and EBF1 expression now ranks second among all genes and first among all TFs, although it does not pass the Bonferroni significance threshold. (c) Additionally controlling for the first principal component of the motif accessibility matrix corrects for a strong batch effect (Methods), which further improves the signal. Using this approach, EBF1 motif accessibility showed the strongest association precisely with EBF1 expression (i.e., the gene-level CAR rank equals one), suggesting that EBF1 may be a CAR, in agreement with the literature [22]. As a further illustration for the improvements achieved using the mixed model approach S1 Fig shows the analogous plot for FOXA1, the first discovered pioneer factor [4,5].
Fig 3
Fig 3. Enrichment of bound motifs for a given TF and its subfamily members.
All TF ChIP-seq experiments from the Myers-lab released as part of the ENCODE project were downloaded. For each TF ChIP-seq experiment we also obtained the corresponding TF motif from the HOCOMOCO database [25]. For a given ChIP-seq experiment, we looked at the processed DHS peaks in the same cell line. We partitioned DHS peaks into two groups depending on whether they were bound by the TF (overlap with a ChIP-seq peak) or not. We then calculated both the fraction of bound and unbound DHS peaks containing a given motif. The enrichment of bound motifs was defined as the ratio of these two fractions. Results are shown from left to right for: the motifs of the TFs that were assayed in the corresponding ChIP-seq experiments (Correct TF motifs), motifs of other TFs from the same subfamily (TF subfamily motifs), and randomly sampled motifs (Random motifs). During sampling, each motif was sampled as often as the number of ChiP-seq experiment available for that motif. We see strong enrichment of TF motifs in ChIP-seq peaks of the TF as well as its subfamily members.
Fig 4
Fig 4. Method comparison across all subfamilies.
Cumulative distribution of CAR ranks at the subfamily level for the 147 tested subfamilies using the three different modelling strategies: ‘standard linear regression’, ‘mixed model regression’ and ‘mixed model PC corrected’ (see legend of Fig 2 and Methods). We see strong enrichment of low ranks implying deviation from the null hypothesis. The linear mixed modelling increases enrichment of low CAR ranks.
Fig 5
Fig 5. Known pioneer TF subfamilies strongly enrich in predicted chromatin accessibility regulators.
Shown in grey is a scaled cumulative distribution plot for subfamily level CAR ranks of subfamilies not annotated as pioneers in Iwafuchi-Doi et al. [3]. In black, we see the cumulative number of pioneer subfamilies that reached at least a given CAR rank. Six out of eight subfamilies show a low CAR rank, which is more than three times as many as one would expect on average when sampling from non-pioneer subfamilies.
Fig 6
Fig 6. Strong associations between GR-like receptor motif and glucocorticoid response genes.
a) Association results for motif accessibility of the TF NR3C1, which belongs to the GR-like receptor subfamily, and mRNA expression across all genes. -Log10 transformed p-values are shown in a QQ-plot. NR3C1 motif accessibility shows strong association with mRNA expression of three glucocorticoid response genes (orange), but only weak association with expression of NR3C1 and other GR-like receptor TFs (green). In this example, motif accessibility is strongly associated with downstream gene expression, but only weakly with expression of the TF itself. b) The network shows functional relationships among the GR-like receptor TFs (green) and the three most strongly associated genes (orange), which are all glucocorticoid response genes. The strength of links shows confidence in functional relationship given in the STRING database. We see numerous links between the downstream glucocorticoid response genes and the GR-like receptor TFs in the STRING database, confirming their functional relatedness, where NR3C1 has the most links to associated genes.

References

    1. Zaret KS, Carroll JS. Pioneer transcription factors: establishing competence for gene expression. Genes Dev. 2011;25: 2227–2241. 10.1101/gad.176826.111 - DOI - PMC - PubMed
    1. Morris SA, Baek S, Sung M-H, John S, Wiench M, Johnson TA, et al. Overlapping chromatin-remodeling systems collaborate genome wide at dynamic chromatin transitions. Nat Struct Mol Biol. 2013;21: 73–81. 10.1038/nsmb.2718 - DOI - PMC - PubMed
    1. Iwafuchi-Doi M, Zaret KS. Pioneer transcription factors in cell reprogramming. Genes and Development. 2014. pp. 2679–2692. 10.1101/gad.253443.114 - DOI - PMC - PubMed
    1. Soufi A, Garcia MF, Jaroszewicz A, Osman N, Pellegrini M, Zaret KS. Pioneer Transcription Factors Target Partial DNA Motifs on Nucleosomes to Initiate Reprogramming. Cell. 2014. - PMC - PubMed
    1. Cirillo LA, Lin FR, Cuesta I, Friedman D, Jarnik M, Zaret KS. Opening of compacted chromatin by early developmental transcription factors HNF3 (FoxA) and GATA-4. Mol Cell. 2002;9: 279–289. - PubMed

Publication types

LinkOut - more resources