Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 8;47(D1):D729-D735.
doi: 10.1093/nar/gky1094.

Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis

Affiliations

Cistrome Data Browser: expanded datasets and new tools for gene regulatory analysis

Rongbin Zheng et al. Nucleic Acids Res. .

Abstract

The Cistrome Data Browser (DB) is a resource of human and mouse cis-regulatory information derived from ChIP-seq, DNase-seq and ATAC-seq chromatin profiling assays, which map the genome-wide locations of transcription factor binding sites, histone post-translational modifications and regions of chromatin accessible to endonuclease activity. Currently, the Cistrome DB contains approximately 47,000 human and mouse samples with about 24,000 newly collected datasets compared to the previous release two years ago. Furthermore, the Cistrome DB has a new Toolkit module with several features that allow users to better utilize the large-scale ChIP-seq, DNase-seq, and ATAC-seq data. First, users can query the factors which are likely to regulate a specific gene of interest. Second, the Cistrome DB Toolkit facilitates searches for factor binding, histone modifications, and chromatin accessibility in any given genomic interval shorter than 2Mb. Third, the Toolkit can determine the most similar ChIP-seq, DNase-seq, and ATAC-seq samples in terms of genomic interval overlaps with user-provided genomic interval sets. The Cistrome DB is a user-friendly, up-to-date, and well maintained resource, and the new tools will greatly benefit the biomedical research community. The database is freely available at http://cistrome.org/db, and the Toolkit is at http://dbtoolkit.cistrome.org.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overall design of Cistrome Data Browser and Toolkit. Cistrome DB incorporates publicly available ChIP-seq, DNase-seq, and ATAC-seq data collected from Gene Expression Omnibus (GEO), Encyclopedia of DNA Elements (ENCODE), and RoadMap Epigenomics. Cistrome DB provides sample annotations and uniformly processed results that allow for comparisons of peaks, signal files, quality control metrics, motifs and imputed target genes. To easily access Cistrome DB data, users can conveniently visualize BigWig files in genome browsers and download peak BED files and putative target gene results. The new Toolkit module includes functionalities that answer three questions: ‘What factors regulates your gene of interest?’, ‘What regulator bind in your interval?’ and ‘What factors have a significant binding overlap to your peak set?’
Figure 2.
Figure 2.
Quantity and quality of new Cistrome DB data. (A) Cumulative size of human and mouse data collection by year. Collection years before the last Cistrome DB release are shown in black, while new collection years are red. (B, C) In the new collection, Cistrome DB increased not only the sample number of each data type, but also the types of factor and histone marks and variants. (D) The TFs (upper) and histone marks or variants (lower) with the most new samples. Blue labels on the x-axis indicate new factors for mouse; red labels indicate new factors for human, and black labels for factors that are novel in Cistrome DB for both human and mouse. (E) Violin plots showing an overview of data quality for old and new collections. Total peak numbers on the log10 scale and the percentage of peaks overlapping with a union of DNase hypersensitive sites (DHS) were calculated.
Figure 3.
Figure 3.
Cistrome DB Toolkit. (A) An example of the first Cistrome DB Toolkit function, showing putative regulators of the human androgen receptor (AR) gene. A parameter of 100kb regulatory potential decay rate was selected to include long-range enhancers of AR. Each dot in this figure represents a ChIP-seq sample. The x-axis includes the top 20 factors, ranked by the maximum regulatory potential score over all ChIP-seq samples representing each factor. (B) The second Toolkit function was used to discover the TFs binding to a known AR enhancer (chrX:66,897,958-66,908,958, Hg38) in prostate cancer. For each sample, the number of peaks overlapping with the interval divided by the total number of peaks in the sample was calculated, and shown on the x axis. The top 200 samples were plotted, categorized by factor on the x-axis. (C) WashU Epigenome Browser tracks of the 5 top-most ranked samples from panel B show the peaks within the examined genomic region. (D, E) Cistromes in the Cistrome DB similar to peaks of an input BATF peak set as determined by the third Toolkit function. The top-most 200 samples detected using two parameter choices (Cistrome DB top 1000 peaks or all peaks) are compared by Venn Diagram in D and by scatter plot in E. The Venn Diagram in D shows that 150 samples out of the top 200 samples are common to both parameter choices. The scatter plot in E depicts the rank comparison of the overlapping top 150 samples, and the TFs represented by the top ten samples are labeled with the TF name.

References

    1. Lelli K.M., Slattery M., Mann R.S.. Disentangling the many layers of eukaryotic transcriptional regulation. Annu. Rev. Genet. 2012; 46:43–68. - PMC - PubMed
    1. Liu T., Ortiz J.A., Taing L., Meyer C.A., Lee B., Zhang Y., Shin H., Wong S.S., Ma J., Lei Y. et al. . Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 2011; 12:R83. - PMC - PubMed
    1. Mei S., Meyer C.A., Zheng R., Qin Q., Wu Q., Jiang P., Li B., Shi X., Wang B., Fan J. et al. . Cistrome cancer: a web resource for integrative gene regulation modeling in cancer. Cancer Res. 2017; 77:e19–e22. - PMC - PubMed
    1. Barski A., Cuddapah S., Cui K., Roh T.-Y., Schones D.E., Wang Z., Wei G., Chepelev I., Zhao K.. High-resolution profiling of histone methylations in the human genome. Cell. 2007; 129:823–837. - PubMed
    1. Johnson D.S., Mortazavi A., Myers R.M., Wold B.. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007; 316:1497–1502. - PubMed

Publication types

Substances