Meta-Analysis

. 2022 Jun 14;13(1):3413.

doi: 10.1038/s41467-022-30770-1.

ChIP-Hub provides an integrative platform for exploring plant regulome

Liang-Yu Fu^#^{1

2}, Tao Zhu^#¹, Xinkai Zhou^#¹, Ranran Yu^#¹, Zhaohui He¹, Peijing Zhang³, Zhigui Wu¹, Ming Chen³, Kerstin Kaufmann⁴, Dijun Chen⁵

Affiliations

¹ State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, 210023, China.
² Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, 10115, Berlin, Germany.
³ Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.
⁴ Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, 10115, Berlin, Germany. kerstin.kaufmann@hu-berlin.de.
⁵ State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, 210023, China. dijunchen@nju.edu.cn.

^# Contributed equally.

PMID: 35701419
PMCID: PMC9197862
DOI: 10.1038/s41467-022-30770-1

Meta-Analysis

ChIP-Hub provides an integrative platform for exploring plant regulome

Liang-Yu Fu et al. Nat Commun. 2022.

. 2022 Jun 14;13(1):3413.

doi: 10.1038/s41467-022-30770-1.

Authors

Liang-Yu Fu^#^{1

2}, Tao Zhu^#¹, Xinkai Zhou^#¹, Ranran Yu^#¹, Zhaohui He¹, Peijing Zhang³, Zhigui Wu¹, Ming Chen³, Kerstin Kaufmann⁴, Dijun Chen⁵

Affiliations

¹ State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, 210023, China.
² Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, 10115, Berlin, Germany.
³ Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China.
⁴ Department for Plant Cell and Molecular Biology, Institute for Biology, Humboldt-Universität zu Berlin, 10115, Berlin, Germany. kerstin.kaufmann@hu-berlin.de.
⁵ State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, 210023, China. dijunchen@nju.edu.cn.

^# Contributed equally.

PMID: 35701419
PMCID: PMC9197862
DOI: 10.1038/s41467-022-30770-1

Abstract

Plant genomes encode a complex and evolutionary diverse regulatory grammar that forms the basis for most life on earth. A wealth of regulome and epigenome data have been generated in various plant species, but no common, standardized resource is available so far for biologists. Here, we present ChIP-Hub, an integrative web-based platform in the ENCODE standards that bundles >10,000 publicly available datasets reanalyzed from >40 plant species, allowing visualization and meta-analysis. We manually curate the datasets through assessing ~540 original publications and comprehensively evaluate their data quality. As a proof of concept, we extensively survey the co-association of different regulators and construct a hierarchical regulatory network under a broad developmental context. Furthermore, we show how our annotation allows to investigate the dynamic activity of tissue-specific regulatory elements (promoters and enhancers) and their underlying sequence grammar. Finally, we analyze the function and conservation of tissue-specific promoters, enhancers and chromatin states using comparative genomics approaches. Taken together, the ChIP-Hub platform and the analysis results provide rich resources for deep exploration of plant ENCODE. ChIP-Hub is available at https://biobigdata.nju.edu.cn/ChIPHub/ .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. The ChIP-Hub platform: data collection and the computational pipeline.**
a Explosive generation of regulome and epigenome data in plants. The scatter plot (top) shows the number of datasets over time, as colored by the top representative plant species. Each data point represents one SRA BioProject. The cumulative number is also shown (in pink). b Timeline plots showing the overview of the number of datasets, publications and BioProjects over time. c Pie chart showing the distribution of datasets by plant species. d Pie chart showing the distribution of datasets by sample categories. e A standardized, semi-automatic analysis pipeline developed for regulome and epigenome experiments. We adapted the working standards provided by the ENCODE consortium to set up the computational pipeline, including read mapping, peak calling and subsequent statistical treatment of replicates. The resulting data are further integrated by ChromHMM for each plant species. All the metadata as well as analyzed data are bundled in our Shiny application ChIP-Hub for visualization and meta-analysis.

**Fig. 2. Evaluation of plant regulome and epigenome data.**
a The annotated experiments by plant species (up) or experimental categories (bottom). b Treemap showing the classification of experiments in *Arabidopsis thaliana* according to transcription factor (TF) families, the types of histone modifications or open chromatin experiments. c Donut charts showing different aspects of evaluation of the annotated experiments. d The bar chart showing the quality of experiments based on various quality metrics proposed by the ENCODE consortium. e Comparison of SPOT scores among different experimental categories. Experiments of input DNA are used for control. The number of datasets in each category is indicated below the boxplot. Statistical significance of difference in terms of the SPOT score between the experiment group and control was calculated by the two-sided Mann–Whitney U test. Boxplot shows the median (horizontal line), second to third quartiles (box), and Tukey-style whiskers (beyond the box). f Distribution of peak summit around the transcription start site (TSS). g Annotated genomic regions versus the genome size. Pie charts show the percentage of genomes annotated by ChIP-seq data. Fitted line and standard errors with 95% confidence intervals are shown. Only genomes with >20 experiments are shown. Full names of genomes can be found in Supplementary Data 11. SPOT: signal portion of tags; FRiP fraction of reads in peaks; NSC normalized strand cross-correlation coefficient. RSC relative Strand cross-correlation coefficient, NRF non-redundant fraction, PBC1/2 PCR bottlenecking coefficients 1/2. Source data are provided as a Source Data file.

**Fig. 3. TF co-associations and hierarchical regulatory networks.**
a Co-binding relationships of TFs. Each row or column represent one TF (colored according to its TF family). The significance of co-binding by any two TFs were tested by Jaccard statistics, which measures the ratio of the number of intersecting base pairs occupied by both TFs to the number of base pairs in their union. Three modules (M1-M3) show the highly interplayed regulators. A full co-association heatmap for all investigated TFs (n = 157) can be found in Supplementary Fig. 8. b Genome browser view of TF binding intensities at the *AP1* locus. Only ChIP-seq experiments for TFs in module M1 are shown. The order of TF ChIP-seq tracks is the same as M1 in a (red box). c Network showing significant co-associations between TFs. Significant TF co-associations are defined as their co-association scores larger than 0.2, an optimal threshold determined by an elbow statistic (Supplementary Fig. 9a). Three highly interplayed modules in a are highlighted. The width of edge represents for the co-association score and the size of node for its degree. d Alluvial diagram showing TF-miRNA-TF FFL motifs. Splines were colored based on the family of miRNA genes (*MIR*). The names of TF or miRNA families were labeled. e Comparison of FFLs identified in this study and in our previous study based on floral data. The significance of overlap ratio was made by the χ² test. f Known regulatory loops validated by our predicted FFLs (solid arrows). Regulators without supported ChIP-seq data are colored in grey so that their regulatory interactions are not confirmed (dashed lines). Source data are provided as a Source Data file.

**Fig. 4. Prediction of tissue-specific regulatory elements promoters and enhancers.**
a Sample similarity based on enhancer activity. Open chromatin samples (with IDs labeled in square brackets) were collected from nine different studies. The input DNA samples (in grey; n = 4) are used for control. Note that samples of productive tissues are well separated from those of vegetative tissues. b Genome browser view of selected samples (colored as a). Annotated promoters and enhancers are provided at the bottom of tracks. Genome browser view of all samples can be found in Supplementary Fig. 11. c Distribution of tissue-specific scores (Jensen-Shannon diversity index) of promoters and enhancers. Highly specific regulatory elements are defined based on a cutoff (0.26) indicated by the dash line. d Enrichment of TF binding sites in accessible regions with low or high specificity. P, promoter; E, enhancer; ns, no significance. Statistical significance of difference was calculated by the two-sided Mann–Whitney U test. Boxplot showing the median (horizontal line), second to third quartiles (box), and Tukey-style whiskers (beyond the box). e Heatmap showing normalized influence of motif-annotated filters on classification of promoters in different tissues. Filters matched to known motifs are labeled. Source data are provided as a Source Data file.

**Fig. 5. Dynamic activity of tissue-specific regulatory elements.**
a Heatmap showing the chromatin accessibility of highly specific regulatory elements (including promoters and enhancers). Regulatory elements are grouped into ten clusters (C1–C10; the same number of tissues) based on their activity. TF target genes are labeled on the right. Clusters specific to flower-, root-, leaf- or seed- related tissues are highlighted. Representative of enriched GO terms for the highlighted clusters are indicated. b Genome browser views of tissue-specific chromatin accessibility at the four chosen gene loci.

**Fig. 6. Evolutionarily tracking plant promoters and enhancers.**
a Phylogenetic tree showing the evolutionary relationships of plant species used in the analysis, including five monocots and twelve dicots. b The number of predicted promoters and enhancers in each species. c Distance of peak summits to the nearest transcription start site (TSS). d Sankey plot showing conserved regulatory elements among seven representative species, using Arabidopsis as a reference. Each line refers active regulatory element (promoter or enhancer) is alignable to the Arabidopsis genome. e Dotplots showing the number of species in which the Arabidopsis promoter (above) or enhancer (below) is alignable. Top conserved promoters and enhancers are labeled and four examples are highlighted in (g). f Barchart summarizing the degree of conservation of promoters and enhancers in each species. g Shown are examples of regulatory regions active in different plant species. h Enrichment analysis of gene ontology (GO) biological pathways for promoters and enhancers with different degree of conservation. Source data are provided as a Source Data file.

**Fig. 7. Integrative analysis and comparison of chromatin states in plants.**
a–f Definition and enrichment for a 12-state ChromHMM model based on eleven histone modification marks in Arabidopsis vegetative-related tissues. Darker green color in the heatmaps indicates a higher probability or enrichment. In the plots, each row corresponds to a different state (in different colors), and each column corresponds to a different mark, a genomic annotation (a), gene expression patterns (b), chromatin accessibility (c), TF binding for a different TF families and leaf enhancers (e), or conservation information (f). Percentage and description of states summarized based on the overall enrichment of different categories of annotations are shown in d. Gene expression data from ref. conserved noncoding sequences (CNSs) and phastCons conservation score (based on nine-way multiple alignment) between Arabidopsis and other crucifers from ref. . Boxplots show the median (horizontal line), second to third quartiles (box), and Tukey-style whiskers (beyond the box). g, h Chromatin state conservation between Arabidopsis and other four plant species with annotated states in vegetative (leaves/rosette) tissues. g Bar chart showing the percentage of conserved Arabidopsis chromatin states. The number of conserved plants is distinctly colored. Colors for states are explained in h. h Enrichment of chromatin state conservation between Arabidopsis (row) and other species (column). Pairwise enrichment score was calculated based on Jaccard statistics, which measures the ratio of the number of conserved base pairs to the number of base pairs in union. Darker green in the heatmap indicates a higher enrichment. States with similar compositions of histone modification marks are colored in the same way among different plant species. Matched states between Arabidopsis and other species are labeled as “X”. Chromatin states without matched states in Arabidopsis are indicated in black. Unmarked states are colored in grey. Annotation of chromatin states in barley, rice, wheat and maize can be found in Supplementary Figs. 15–19.

See this image and copyright information in PMC

References

1. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science (80-.). 10.1126/science.1141319 (2007). - PubMed
1. Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell10.1016/j.cell.2007.05.009 (2007). - PubMed
1. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods10.1038/nmeth1068 (2007). - PubMed
1. Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature10.1038/nature06008 (2007). - PMC - PubMed
1. Kaufmann K, et al. Orchestration of floral initiation by APETALA1. Sci. (80-.). 2010;328:85–89. doi: 10.1126/science.1185244. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ChIP-Hub provides an integrative platform for exploring plant regulome

Affiliations

ChIP-Hub provides an integrative platform for exploring plant regulome

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials