Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 18:13:942710.
doi: 10.3389/fpls.2022.942710. eCollection 2022.

CisCross: A gene list enrichment analysis to predict upstream regulators in Arabidopsis thaliana

Affiliations

CisCross: A gene list enrichment analysis to predict upstream regulators in Arabidopsis thaliana

Viktoriya V Lavrekha et al. Front Plant Sci. .

Abstract

Having DNA-binding profiles for a sufficient number of genome-encoded transcription factors (TFs) opens up the perspectives for systematic evaluation of the upstream regulators for the gene lists. Plant Cistrome database, a large collection of TF binding profiles detected using the DAP-seq method, made it possible for Arabidopsis. Here we re-processed raw DAP-seq data with MACS2, the most popular peak caller that leads among other ones according to quality metrics. In the benchmarking study, we confirmed that the improved collection of TF binding profiles supported a more precise gene list enrichment procedure, and resulted in a more relevant ranking of potential upstream regulators. Moreover, we consistently recovered the TF binding profiles that were missing in the previous collection of DAP-seq peak sets. We developed the CisCross web service (https://plamorph.sysbio.ru/ciscross/) that gives more flexibility in the analysis of potential upstream TF regulators for Arabidopsis thaliana genes.

Keywords: DAP-seq; RNA-seq; multi-omics data integration; proximal promoters; transcription factor binding profiles.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
CisCross algorithm scheme (see section “Materials and methods”). Green/pink colors mark foreground and background data and respective parallel processes of their analysis. Foreground and background data comprise the annotations of promoter regions of the input genes and the rest genes, respectively. For one DAP-seq set of peaks, the first step of the analysis maps the peaks to promoters of the input genes and the rest genes. The second step uses these data of genome mapping to compile a 2 × 2 contingency table for the input genes and the rest genes with the counts of genes whose promoters overlap or do not overlap the peaks. Finally, Fisher’s exact test is applied to estimate the enrichment of the peaks in promoters (p-value). Output data comprise the list of enriched TF binding profiles in the ascending order of FDR (the significance p-value adjusted for multiple testing).
FIGURE 2
FIGURE 2
Summary statistics on the Plant Cistrome, CisCross-GEM, and CisCross-MACS2 versions of the DAP-seq peak set collection. (A) The total number of peak sets. “New” implies sets missing in the Plant Cistrome version. “Increase”/“Decrease” means that the number of peaks in sets increased/decreased at least twofold; “Small changes” indicates any smaller changes. X-axes in panels (A,B) denote the version of the DAP-seq collection. (B) The number of col and colamp peak sets in three versions of the DAP-seq collection. (C) Distribution of mean peak length (Y-axis) in individual peak sets (X-axis) for the CisCross-MACS2 version of the DAP-seq collection. Red line denotes the fixed peak length in the Plant Cistrome version (200 bp). Blue/orange colors mark the peak sets with shorter/longer mean peak length. A few example peak sets are named.
FIGURE 3
FIGURE 3
Comparison of the results for the gene list enrichment analysis in pairwise combinations of different versions of the DAP-seq collection for the benchmark compilation of RNA-seq data from the EBI Expression Atlas (see section “Materials and methods”). Panels (A–C) show the percentage of overlap of the output lists for potential TF regulators (FDR < 0.05). Panels (D–F) show the total number of potential TF regulators (FDR < 0.05).
FIGURE 4
FIGURE 4
Examples of the output data for the CisCross web service. (A) CisCross-Main mode for gene list enrichment analysis (for the list of auxin up regulated genes from GSE149410). (B) CisCross-Light mode for the upstream region of PIN7 (AT1G23080) gene.

References

    1. Bailey T. L. (2021). STREME: accurate and versatile sequence motif discovery. Bioinformatics 37 2834–2840. 10.1093/bioinformatics/btab203 - DOI - PMC - PubMed
    1. Bartlett A., O’Malley R. C., Huang S. C., Galli M., Nery J. R., Gallavotti A., et al. (2017). Mapping genome-wide transcription-factor binding sites using DAP-seq. Nat. Prot. 12 1659–1672. 10.1038/nprot.2017.055 - DOI - PMC - PubMed
    1. Benjamini Y., Hochberg Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Methodol. 57 289–300.
    1. Bhardwaj V., Heyne S., Sikora K., Rabbani L., Rauer M., Kilpert F. (2019). snakePipes: facilitating flexible, scalable and integrative epigenomic analysis. Bioinformatics 35 4757–4759. 10.1093/bioinformatics/btz436 - DOI - PMC - PubMed
    1. Bobrovskikh A. V., Zubairova U. S., Bondar E. I., Lavrekha V. V., Doroshkov A. V. (2022). Transcriptomic data meta-analysis sheds light on high light response in Arabidopsis thaliana L. Int. J. Mol. Sci. 23:4455. 10.3390/ijms23084455 - DOI - PMC - PubMed