Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Aug;42(14):9158-70.
doi: 10.1093/nar/gku644. Epub 2014 Jul 24.

metaseq: a Python package for integrative genome-wide analysis reveals relationships between chromatin insulators and associated nuclear mRNA

Affiliations

metaseq: a Python package for integrative genome-wide analysis reveals relationships between chromatin insulators and associated nuclear mRNA

Ryan K Dale et al. Nucleic Acids Res. 2014 Aug.

Abstract

Here we introduce metaseq, a software library written in Python, which enables loading multiple genomic data formats into standard Python data structures and allows flexible, customized manipulation and visualization of data from high-throughput sequencing studies. We demonstrate its practical use by analyzing multiple datasets related to chromatin insulators, which are DNA-protein complexes proposed to organize the genome into distinct transcriptional domains. Recent studies in Drosophila and mammals have implicated RNA in the regulation of chromatin insulator activities. Moreover, the Drosophila RNA-binding protein Shep has been shown to antagonize gypsy insulator activity in a tissue-specific manner, but the precise role of RNA in this process remains unclear. Better understanding of chromatin insulator regulation requires integration of multiple datasets, including those from chromatin-binding, RNA-binding, and gene expression experiments. We use metaseq to integrate RIP- and ChIP-seq data for Shep and the core gypsy insulator protein Su(Hw) in two different cell types, along with publicly available ChIP-chip and RNA-seq data. Based on the metaseq-enabled analysis presented here, we propose a model where Shep associates with chromatin cotranscriptionally, then is recruited to insulator complexes in trans where it plays a negative role in insulator activity.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Binary heatmaps of ChIP-seq peak centers +/− 500 bp showing overlap of a single factor between cell types (A, B) or overlap of both factors in a single cell type (C, D). Each row indicates a unique genomic region, and a black mark in the column shows the presence of the factor in that region. Total number of peaks in each experiment is indicated in parentheses. See Supplementary Figure S2 for the combined 4-way comparison and an analogous Venn diagram showing the same data.
Figure 2.
Figure 2.
ChIP-chip signal of factors profiled by the modENCODE consortium over Su(Hw) and Shep peaks in Kc167 cells (A, B) and BG3 cells (C, D). Each panel represents one set of called peaks identified in this study, and each row in the panel represents the average normalized ChIP-chip signal reported by modENCODE (lowess-smoothed log2(IP/input), or M-score) for a single factor over those peaks. Rows are sorted by the mean value over the center 200 bp.
Figure 3.
Figure 3.
Meta-gene plot of ChIP-seq signal for Shep in BG3 cells. Each row in the matrix (A) represents the normalized ChIP-seq enrichment over one gene scaled to 500 bins, and +/− 5 kb regions scaled to 100 bins. Enrichment is calculated by first scaling IP and input libraries to reads per million mapped reads (RPMMR) and then subtracting the input signal from the IP signal (color bar, bottom right). Genes are ranked by expression (RPKM) in BG3 cells (right panel, data from modENCODE). Middle line plot (B) shows the column averages of the heatmap, with the wider band indicating 95% confidence interval. Bottom line plot (C) shows average ChIP-seq signal over expression quantiles (percentiles indicated in legend). Note that white rows in the heatmap are repetitive genes (rRNA, histones) where multi-mapping reads have been removed in the ChIP-seq analysis.
Figure 4.
Figure 4.
Scatterplots of enrichment versus expression in RIP-seq for Su(Hw) and Shep in BG3 and Kc167 cells. Green (Su(Hw)) or blue (Shep) dots show genes that encode transcripts enriched by RIP-seq with an adjusted P-value < 0.05. Red dots indicate the genes that encode transcripts pulled down by both Su(Hw) and Shep RIP in the same cell type, and gray dots show all other genes. Rug plots extending along the bottom represent genes that had zero reads in the RIP samples and therefore have undefined log2 fold change. Genes with zero reads in the input samples have both undefined log2(RPKM) and undefined log2 fold change, and so are not shown.
Figure 5.
Figure 5.
Scatterplots showing the relationship of locus length in kb and expression for all transcripts, along with marginal histograms. Top histograms show distribution of x-axis values; side histograms show distribution of y-axis values. Green (Su(Hw)) or blue (Shep) dots show genes encoding transcripts enriched by RIP, and gray dots show all other genes.
Figure 6.
Figure 6.
Chromatin context of genes encoding RIP-enriched transcripts. Heatmaps are centered on the 5′-most TSS of genes and extend up- and downstream 1 kb. Values represent the average M-scores (lowess-smoothed log2(IP/input)) over enriched gene TSSs minus the average M-scores over gene TSSs at all other genes. For each cell type, rows are sorted by Su(Hw) signal in that cell type.

References

    1. Cock P.J.A., Fields C.J., Goto N., Heuer M.L., Rice P.M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010;38:1767–1771. - PMC - PubMed
    1. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. - PMC - PubMed
    1. Kuhn R.M., Haussler D., Kent W.J. The UCSC genome browser and associated tools. Brief. Bioinform. 2013;14:144–161. - PMC - PubMed
    1. Kent W.J., Zweig A.S., Barber G., Hinrichs A.S., Karolchik D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics. 2010;26:2204–2207. - PMC - PubMed
    1. Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T., et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. - PMC - PubMed

Publication types

Associated data