Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar;627(8004):671-679.
doi: 10.1038/s41586-024-07141-5. Epub 2024 Mar 6.

Decoding chromatin states by proteomic profiling of nucleosome readers

Affiliations

Decoding chromatin states by proteomic profiling of nucleosome readers

Saulius Lukauskas et al. Nature. 2024 Mar.

Erratum in

  • Publisher Correction: Decoding chromatin states by proteomic profiling of nucleosome readers.
    Lukauskas S, Tvardovskiy A, Nguyen NV, Stadler M, Faull P, Ravnsborg T, Özdemir Aygenli B, Dornauer S, Flynn H, Lindeboom RGH, Barth TK, Brockers K, Hauck SM, Vermeulen M, Snijders AP, Müller CL, DiMaggio PA, Jensen ON, Schneider R, Bartke T. Lukauskas S, et al. Nature. 2024 Apr;628(8009):E6. doi: 10.1038/s41586-024-07392-2. Nature. 2024. PMID: 38594341 Free PMC article. No abstract available.

Abstract

DNA and histone modifications combine into characteristic patterns that demarcate functional regions of the genome1,2. While many 'readers' of individual modifications have been described3-5, how chromatin states comprising composite modification signatures, histone variants and internucleosomal linker DNA are interpreted is a major open question. Here we use a multidimensional proteomics strategy to systematically examine the interaction of around 2,000 nuclear proteins with over 80 modified dinucleosomes representing promoter, enhancer and heterochromatin states. By deconvoluting complex nucleosome-binding profiles into networks of co-regulated proteins and distinct nucleosomal features driving protein recruitment or exclusion, we show comprehensively how chromatin states are decoded by chromatin readers. We find highly distinctive binding responses to different features, many factors that recognize multiple features, and that nucleosomal modifications and linker DNA operate largely independently in regulating protein binding to chromatin. Our online resource, the Modification Atlas of Regulation by Chromatin States (MARCS), provides in-depth analysis tools to engage with our results and advance the discovery of fundamental principles of genome regulation by chromatin states.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Large-scale identification of chromatin readers by SILAC dinucleosome affinity purifications.
a, Generation of modified dinucleosomes. Modified histones H3.1 and H4 were prepared by native chemical ligations of N-terminal tail peptides (H3, amino acids 1–31; H4, amino acids 1–28) to truncated histone cores (H3.1Δ1–31T32C or H4Δ1–28I29C, respectively). Note that this introduces H3T32C and H4I29C mutations that might affect protein binding to nearby modifications. Ligated histones were refolded into octamers and assembled into dinucleosomes using a biotinylated DNA containing two nucleosome-positioning sequences (di-601). For some experiments, CpG-methylated DNA (m5C) or H2A.Z were used. b, SNAP purifications. Modified nucleosomes were immobilized on streptavidin beads and incubated with nuclear extracts from HeLa S3 cells grown in isotopically light (R0K0) or heavy (R10K8) SILAC medium. c, Protein responses to modified nucleosomes. For each SNAP experiment, bound proteins were identified and quantified using MS, and the forward (x axis) and reverse (y axis) SILAC ratios (H/L ratio) were plotted on a logarithmic (log2) graph. d, A library of modified dinucleosomes. A header specifies the modification status of each nucleosome. Nucleosomes are arranged in columns, with the respective modifications displayed in rows. Modifications of specific lysine residues in histone H3 and H4 and the presence of DNA methylation (meCpG) or H2A.Z are colour coded as indicated. Nucleosomes are ordered to imitate clustering by increasingly active chromatin states. Monometh., monomethylation; PTMs, post-translational modifications. e, Visualization of protein binding responses to the 55 modified dinucleosomes profiled by SNAP. The log2[H/L] ratios for each protein in each SNAP experiment are shown as circles, with the right half representing the forward and the left half the reverse log2[H/L ratio]. Recruitment (red) and exclusion (blue) are indicated. The reverse H/L ratio was inverted to display both ratios on the same scale. Circle sizes denote the total MS1 peak intensities on a log10 scale. The asterisks indicate experiments that are shown in Extended Data Fig. 1b–d. The dagger symbols (†) indicate proteins that are highlighted in Extended Data Fig. 1b–e.
Fig. 2
Fig. 2. Feature effect estimates reveal binding responses of chromatin readers to different nucleosomal features.
a, Nucleosomes informative of protein responses to H3K4me3. The four pairs of dinucleosomes that differ only by H3K4me3, alongside the self-informative H3K4me3 dinucleosome (top), and the binding responses of four representative proteins in the corresponding SNAP experiments (bottom) are shown. b, Feature effect estimates of proteins showing H3K4me3-dependent nucleosome binding. The change in the log2[H/L ratio] attributable to H3K4me3 (x axis) is plotted against the P value (limma, two-sided, Benjamini–Hochberg adjusted) on a −log10 scale (y axis). The vertical lines highlight an effect to fold change of 1, and the horizontal line signifies the FDR threshold of 0.01. Selected protein complexes are highlighted. Duplicate protein identifiers, for example, PHF8 (1), mark distinct UniProt IDs with the same gene name (Trembl versus SwissProt versions); for annotations, see Supplementary Table 1. c, The number of interactors responsive to different chromatin features. Owing to their frequent co-occurrence, blocks of acetylation, such as H3K9acK14ac, H3K9acK14acK18acK23acK27ac (H3ac) and H4K5acK8acK12acK16ac (H4ac) were treated as single features. Proteins with statistically significant (limma, FDR ≤ 0.01) effect estimates ≥ 1 classify as strongly recruited, or strongly excluded if their estimate is ≤−1. Changes in log2[H/L ratio] < 1 are considered to be weakly recruited or excluded. d, The number of chromatin features regulating protein binding responses. The grey bars tally the number of proteins with statistically significant feature effects (limma, FDR ≤ 0.01). The black bars additionally tally proteins with strong feature effects (absolute effect ≥ 1). e, Clustered heat map of feature effect estimates of proteins strongly responding to at least one feature as shown in c. Individual estimates are colour coded. Entries without an estimate due to insufficient data are marked in grey. Prototype proteins representing the binding response of each cluster are shown on the right. Notable protein complexes are highlighted.
Fig. 3
Fig. 3. Differential binding of proteins to H3K4 methylation and H3/H4 acetylation states.
a, Comparison of H3K4me3- versus H3K4me1-responsive proteins. H3K4me3- or H3K4me1-dependent changes in the log2[H/L ratio] are plotted on the x and y axes, respectively. Proteins with statistically significant estimates (limma, two-sided, Benjamini–Hochberg-adjusted FDR ≤ 0.01) are circled with a grey border. The grey area marks ±0.2 radians away from the x = y line. Selected protein complexes are highlighted. While H3K4me1 recruits only BRPF3 but no other interactors, it still excludes, for example, the PRC2 complex, albeit not as strongly as H3K4me3. b, CLR-predicted network overlayed with chromatin feature effects. The heat maps reveal the degree and specificity of protein recruitment or exclusion by the different features. Protein complexes with statistically significant regulation (CAMERA, FDR ≤ 0.01, median effect ≥ 0.3; Supplementary Table 8) were annotated for each feature after manual curation. A zoomable version is provided in the MARCS resource. c, Comparison of proteins responding to H3 versus H4 acetylation. Changes in the log2[H/L ratio] attributable to H3ac or H4ac are plotted on the x and y axes, respectively. Data representation as in a. Proteins are coloured by the difference between their H3ac and H4ac responses. BAF and CHRAC complex subunits are highlighted with coloured borders and labels. d, The preference of protein complexes for H3 or H4 acetylation. Markers indicate the median effect of the H3ac versus the H4ac feature across all complex subunits with protein response measurements (the number of measurements per complex/feature is shown in Supplementary Fig. 1). The error bars represent the empirical 95% confidence interval (CI) of this median effect estimated from 100,000 random samples of subunit effects, accounting for their variance. The coloured bars highlight the difference between these median estimates for H3ac and H4ac. Complexes are ordered from H3ac to H4ac preference. The asterisks denote estimates for exclusive complex subunits.
Fig. 4
Fig. 4. Nucleosomal modifications and linker DNA constitute orthogonal routes of protein engagement with chromatin.
a, Schematic of dinucleosomes used in label-free MS-based pull-downs for evaluating the effect of linker DNA length and sequence on protein binding to active (right) and repressive (left) chromatin states. b, Clustered heat map depicting protein binding responses to dinucleosomes incorporating different combinations of 200 bp scrambled DNA or SV40 promoter sequence-based linkers and promoter PTMs (H3K4me3K9acK14acK18acK23acK27ac in combination with H4K5acK8acK12acK16acK20me2 and H2A.Z). Data are shown as the log2-transformed fold change (log2[FC]) in the normalized protein abundances compared with unmodified dinucleosomes with a 50 bp linker. c, Comparison of H3K9me3-binding responses on dinucleosomes with 35 bp and 50 bp linkers. Proteins responding to H3K9me3, linker length or both were determined using limma statistics and are highlighted in red, blue or purple, respectively. Only binding responses fulfilling the following two criteria are depicted: (1) log2[FC] > 1 or log2[FC] < −1 compared with unmodified dinucleosomes with 50 bp linker; (2) Benjamini–Hochberg-adjusted P ≤ 0.05. The x = y line indicates where binding responses to H3K9me3 dinucleosomes incorporating 35 bp and 50 bp linkers are identical. The grey area marks ±0.2 radians away from the x = y line. Core histones (normalization controls) are indicated in dark grey. The smaller datapoints indicate response estimates based on single data points. The triangles indicate points outside the data axes. d, Comparison of H3K27me3-binding responses on dinucleosomes with 35 bp and 50 bp linkers. Data representation in dg is as described in c. e, Comparison of protein binding responses to promoter PTMs on dinucleosomes with 200 bp scrambled DNA and SV40-promoter-sequence-based linkers. f, Comparison of sequence-specific protein binding responses to the SV40 promoter linker in unmodified dinucleosomes (di-nucl.) and dinucleosomes decorated with promoter PTMs. g, Comparison of protein binding responses to SV40 promoter linker and promoter PTMs.
Fig. 5
Fig. 5. The INO80 complex recognizes a multivalent nucleosome-modification signature.
a, CLR-predicted TBRG1–INO80 interaction. TBRG1–INO80 interactions were reported in several screens and deposited at BioGRID but never validated. b, TBRG1 interacts with INO80. Volcano plot of proteins that are significantly enriched (t-test, two-sided, Benjamini–Hochberg-adjusted FDR ≤ 0.05) in n = 3 biologically independent INO80B-V5 immunoprecipitations (Extended Data Fig. 5h) followed by label-free MS. c, Composition of the INO80 complex. The relative stoichiometries between TBRG1 and INO80 were calculated using quantitative MS data from the INO80B-V5 immunoprecipitation experiments shown in b. n = 3. Data are the mean ± s.d. of the stoichiometry values. d, Features driving the INO80 nucleosome-binding response. Individual effect estimates (change in log2[H/L ratio]) for INO80-exclusive subunits are shown as dots (estimate significantly non-zero, limma, two-sided, Benjamini–Hochberg-adjusted FDR ≤ 0.01) or crosses (estimate not statistically significant). The bars highlight the median effect across all complex subunits with protein response measurements (n = 11, except for DNA methylation, H3K27ac, H3K9me2 and H3K27me2, for which n = 1 and no estimate was derived). The error bars represent the empirical 95% CI of this median effect estimated from 100,000 random samples of subunit effects, accounting for their variance. The bold font indicates features with enrichments greater than expected by chance (CAMERA, Benjamini–Hochberg-adjusted FDR ≤ 0.01; Supplementary Table 8). e, Targeted dinucleosome pull-downs confirm INO80 binding to nucleosomes containing hyperacetylated H3 (H3ac), H4 (H4ac) and/or H2A.Z. Binding was detected by immunoblotting against INO80B and ACTR5. TBRG1 follows the INO80-binding pattern. The HeLa S3 cell nuclear extract used was a mixture of three independent preparations. Different amounts of the mixed extract were loaded as inputs for the different immunoblots. Experiments were independently repeated three times with similar results. Unmod., unmodified. f, Quantitative label-free LC–MS-based analysis of histone modifications and H2A.Z in mononucleosomes co-purified with ACTR5 from MNase-digested HeLa cell chromatin. The relative PTM or H2A.Z abundance over input chromatin is plotted as the log2[FC] for n = 2 independent biological experiments.
Extended Data Fig. 1
Extended Data Fig. 1. SNAP experiments reveal differential responses of chromatin readers to nucleosomal modification signatures.
a, SILAC Nucleosome Affinity Purifications (SNAP). For SNAP experiments modified nucleosomes were immobilized on streptavidin beads and incubated with nuclear extracts from HeLa S3 cells grown in isotopically light (R0K0) or heavy (R10K8) SILAC media. In ‘forward’ experiments heavy extracts were incubated with modified and light extracts with unmodified nucleosomes, while in ‘reverse’ experiments the extracts were exchanged. Bound proteins were eluted using an on-bead digestion protocol and identified and quantified by mass spectrometry. For each SNAP experiment the SILAC ratios Heavy/Light (Ratio H/L) of the forward and reverse experiment of the identified proteins were measured and plotted in a logarithmic (log2) graph (see Fig. 1c and panels b-d). The H/L ratios indicate binding preferences to the modified or the unmodified nucleosomes and allow the unbiased identification of proteins that are either recruited or excluded by the modifications, in addition to proteins that bind nucleosomes but do not show a strong response to the modifications. b, Exemplary SNAP experiment with H3K27me3-modified di-nucleosomes. The results show that the ORC subunit ORC2 and the PRC2 subunit EZH2 are recruited by the H3K27me3 modification as previously reported,. c, SNAP experiment with H3K4me3- and H4K16ac-modified di-nucleosomes. This modification pattern recruits the H3K4me3 reader PHF8 but excludes EZH2 through loss of PRC2 binding to the N-terminus of histone H3. d, SNAP experiment with di-nucleosomes combining di-methylation of lysine 20 and acetylation of lysines 5, 8, 12, and 16 on histone H4 (H4acK20me2). This nucleosome strongly recruits BRD4 through its interaction with H4ac via its bromodomains as well as the ORC subunit ORC2 through recognition of H4K20me2 via ORC1. e. Results for SNAPs with the entire library of 55 modified di-nucleosomes. Tracking the signals of BRD4, EZH2, ORC2, and PHF8 as highlighted in b-d allows interrogation of their responses to the different modification signatures. The order of SNAP experiments corresponds to the order of di-nucleosomes shown in Fig. 1d.
Extended Data Fig. 2
Extended Data Fig. 2. Feature effect estimates provide a breakdown of key modification determinants driving nuclear protein recruitment to chromatin.
a, Heatmap visualization of the binding responses of PRC1 complexes to the 55 differentially modified di-nucleosomes used in the SNAP experiments. Note that subunits unique to different PRC1 sub-complexes demonstrate distinct sub-complex-specific binding behaviours, while core subunits shared between PRC1 sub-complexes show a superposition of such distinct binding behaviours. b, Differential enrichment of CBX4 and CBX8 in targeted pull-down experiments from HeLa S3 nuclear extracts using di-nucleosomes decorated by H3K27me3 or different combinations of H3ac, H4ac, and/or H2A.Z and evaluation by immunoblot. Note the enrichment of CBX4 in pull-downs with di-nucleosomes containing H4ac or H3K27me3 and enrichment of CBX8 with di-nucleosomes containing H3K27me3 or H3ac confirming the SNAP results. Experiments were independently repeated twice with similar results in both replicates. c, Chromatin feature effect estimates of the nucleosome binding response of the CBX4 and CBX8 subunits of the canonical PRC1 complex. The bars highlight the limma effect estimates (change in log2 ratio H/L) for each subunit for the 15 different chromatin features. N indicates the number of nucleosome pairs informative of the different chromatin features that were used to calculate the effect estimate (see also Extended Data Fig. 3a, b and Supplementary Table 3), where the points represent the mean change in the log2 ratio H/L per pull-down pair. The error bars represent the 95% CI of the effect estimates (limma). Statistically significant effects (limma, two-sided, Benjamini/Hochberg-adjusted FDR ≤ 0.01) are highlighted in black frames. Note the distinct binding profiles, where CBX4 recruitment to di-nucleosomes is stimulated predominantly by H4ac and to a lesser extent by H3K27me3, while CBX8 recruitment is stimulated by H3K27me2/3 and to a lesser extent by H3ac, directly reflecting the immunoblot validation shown in b. d, Heatmap depicting the chromatin feature effects of the nucleosome binding responses of different PRC1 sub-complexes. The median feature effect estimates across all complex subunits with protein response measurements for a given feature are displayed for the different complexes as indicated in the colour key. In order to disambiguate variant-specific responses, the feature effect estimates for only exclusive subunits of the different complexes are shown as separate rows in the data. Statistically significant associations are indicated with asterisks (CAMERA, Benjamini/Hochberg-adjusted FDR: * ≤ 0.01, ** ≤ 0.001, *** ≤ 0.0001). Cells where no statistical estimate could be made due to insufficient data are marked with “?”. See also Supplementary Table 8.
Extended Data Fig. 3
Extended Data Fig. 3. Effect estimates of protein responses to chromatin features.
a, Comparisons of SNAP experiments performed to determine the effect of H3K4me3 on protein binding. SNAP experiments of nucleosome pairs informative of H3K4me3 are shown in the upper panel. Experiment H22 (H4ac+H3K4me3) is shown enlarged below. Protein positions of the four exemplary proteins highlighted in Fig. 2a are indicated in the scatter plots. Protein positions in the paired nucleosomes lacking the H3K4me3-modification are shown by empty circles in the scatterplots of the corresponding nucleosomes containing the H3K4me3-mark to highlight changes in position. Imputed values are plotted with smaller dots. The mean of the changes in the log2 H/L ratios of the forward and reverse experiment (n = 2 biologically independent pull-down experiments) are highlighted in the bar plots for the individual comparisons for each of the selected proteins as shown. The black lines indicate the feature effect estimates for H3K4me3 for the four proteins derived by limma based on all comparisons from all H3K4me3-informative pull-down pairs (see also Fig. 2b). SNAP experiment identifiers are listed in panel b, Supplementary Table 1, and the Supplementary Information. b, Matrix of pairs of di-nucleosomes which are informative of chromatin modification features. Pairs are identified as defined in the legend in the bottom right corner. The leftmost column and bottom row indicate nucleosomes which contain only the modification of interest and are therefore self-informative. SNAP experiment identifiers are listed in Supplementary Table 1 and Supplementary Information. Only features with two or more informative pairs of nucleosomes, and therefore an independent experimental replicate, were quantified for the feature effect estimates. c, Volcano plot of H3K4me1-responsive proteins. Data representation and labelling of selected protein complexes as in Fig. 2b. Duplicate protein identifiers with numbers in parentheses, e.g. DNMT1 (1), correspond to distinct UniProt IDs with the same gene name (i.e. Trembl vs. SwissProt versions), see also annotations in Supplementary Table 1. d, Volcano plots showing the effect estimates for the protein responses to the 12 (out of 15) chromatin features not highlighted in panel c, Fig. 2b, and Extended Data Fig. 5a. Data representation as in Fig. 2b. Selected proteins are highlighted.
Extended Data Fig. 4
Extended Data Fig. 4. Integrative analysis of MARCS with ENCODE ChIP-seq datasets.
a, Schematic representation of the integrative NGS dataset analysis. Briefly, the peak data for the datasets was binned at 1 kb resolution. For each pair of datasets, the pairwise co-occurrence matrix was recorded, tracking the number of bins in which the peaks overlap. The marginal and joint entropies, together with the mutual information (MI), were computed from the co-occurrence matrices. Note, as the mutual information measures the entropy shared by the two proteins (venn diagram) it can be normalized via the entropy of one of the two factors. Since in MARCS we are interested in the explanatory power of chromatin features on protein binding, by convention we always normalized by the entropy of the protein. The normalized mutual information estimates are therefore interpretable as the fraction of uncertainty in protein localization that can be explained by the feature. For details see online methods. b, Summary of the relationships between MARCS feature effect estimates and NGS datasets for the Tier 1 ENCODE K562 cell line. The ChIP-seq, ATAC-seq, and DNase-seq experiments from ENCODE are plotted in columns together with the chromatin state annotations from the NIH Roadmap. The rows represent MARCS protein groups subdivided by their feature effect estimates, only groups with ≥5 proteins are shown. Each cell of the heatmap indicates two measurements that contrast the normalized MI (see a) for proteins that MARCS predicts to be strongly recruited or excluded by the feature to the normalized MI of other proteins (i.e. proteins neither strongly recruited nor strongly excluded by the feature, including proteins with no feature effect estimate at all). The colour indicates the difference between the mean log2 of the normalized MI estimates in the feature-associated group versus the mean of the log2 estimates of other proteins. The size and the border shading of the square indicates the statistical significance of the difference (Mann-Whitney U test, two-sided, Benjamini/Hochberg-adjusted). See the colour bar and the legend. Significant red colours indicate that a given chromatin feature ChIP-seq experiment is more predictive of ChIP-seqs of proteins associated with a given MARCS-feature than ChIP-seqs of an average protein. Significant blue colours indicate the opposite. The rows and columns were clustered hierarchically to highlight similarities. c, Integrative analysis of ENCODE NGS data for the K562 cell line in relation to H3K4me1 and H3K4me3 ChIP-seq peaks. The fraction of entropy of a protein or feature explained by the information about H3K4me3 and H3K4me1 peaks is plotted on the x and y axes, respectively. Larger values indicate stronger mutual information between the peak distributions. The dotted x = y line indicates where H3K4me1 and H3K4me3 have exactly the same explanatory power. The shaded area corresponds to ± 0.2 radians from this line. MARCS feature estimates for H3K4me3 are indicated in red (strong recruitment) or blue (strong exclusion). Proteins without strong recruitment or exclusion are shown in grey, no feature effect estimate is marked by “X”. d, Integrative analysis as in c performed for NIH Roadmap promoter (x axis) and enhancer (y axis) chromatin states. Note, that MARCS H3K4me3 readers again share higher mutual information with the promoter chromatin state than the enhancer state. Only a few BAF complex subunits (SMARCE1, ARID1B) show a weak preference for enhancers. Data representation is as in c. e, Integrative analysis of ENCODE NGS data for the K562 cell line in relation to one of the H3K4me3 ChIP-seq replicates (highlighted in b). Normalized MI (i.e. fraction of entropy of proteins/chromatin features explained by the H3K4me3 ChIP-seq) is plotted on the X axis, while the Kendall correlation coefficient of overlapping peak heights is plotted on the Y axis. Protein datasets are plotted in grey, while chromatin feature and accessibility datasets are plotted in green and yellow, respectively. Proteins strongly recruited to H3K4me3 based on their MARCS feature effect estimates are highlighted in red, and strongly excluded proteins are highlighted in blue. Note that proteins strongly recruited to H3K4me3 have, on average, higher normalized MI estimates than others (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.01). f, Data in e plotted with proteins strongly recruited to H3K27me3 based on MARCS feature effect estimates highlighted in red. Note that these proteins have on average lower normalized MI estimates than others (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.05). g, Integrative analysis of ENCODE NGS data for the K562 cell line in relation to one of the H3K4me1 ChIP-seq replicates (highlighted in b). Data presented as in e. Proteins strongly recruited to H3K4me3 based on MARCS feature effect estimates are highlighted in red, and strongly excluded are highlighted in blue. Note that there is no statistically significant difference between these proteins and other proteins (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.05). h, Data in g plotted with proteins strongly recruited to H3K27me3 based on MARCS feature effect estimates highlighted in red. Note that these proteins have on average lower normalized MI estimates than others (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.01). i, Integrative analysis of ENCODE NGS data for the K562 cell line in relation to the H2A.Z ChIP-seq (highlighted in b). Data presented as in e. Proteins strongly recruited to H2A.Z based on MARCS feature effect estimates are highlighted in red. Note that these proteins have on average higher normalized MI estimates than others (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.05). j, Data in i plotted with proteins strongly recruited to H4ac based on MARCS feature effect estimates highlighted in red. Note that these proteins have on average higher normalized MI estimates than others (Mann-Whitney U test, two-sided, Benjamin/Hochberg-adjusted FDR < 0.01).
Extended Data Fig. 5
Extended Data Fig. 5. The INO80 complex interacts with TBRG1 and recognizes a multivalent H3ac/H4ac-H2A.Z modification signature.
a, Volcano plot of H2A.Z-responsive proteins. Data representation as in Fig. 2b. NSL, SRCAP, and INO80 complex subunits are highlighted. NSL subunits are significantly enriched (CAMERA, Benjamini/Hochberg-adjusted FDR ≤ 0.01, see Supplementary Table 8) while TOP2B is negatively regulated by H2A.Z. b, Breakdown of protein responses to H4ac and H2A.Z. Data representation as in Fig. 3a. Selected protein complexes are highlighted. Note that the INO80 complex responds to both H4ac and H2A.Z. c, Breakdown of protein responses to H3ac and H2A.Z. Data representation as in Fig. 3a. Note that the INO80 complex responds to both H3ac and H2A.Z. d, Heatmap visualization of the binding responses of SRCAP, INO80, and NSL complex subunits to the 55 modified di-nucleosomes. The complexes respond to multiple chromatin modification states which are strongly modulated by H2A.Z. Note that the nucleosome response profile of H2A.Z itself (H2AFV/H2AFZ) is similar to the SRCAP complex except in the five nucleosomes with recombinant H2A.Z, consistent with SRCAP’s role in H2A.Z loading. The TBRG1 binding profile follows that of INO80 subunits. The ACTL6A/RUVBL1/RUVBL2 module is shared between INO80, SRCAP, and other complexes, its binding pattern indicates preferential localization to the SRCAP complex. The NSL complex is enriched by H2A.Z-containing nucleosomes, albeit some of the subunits (marked with asterisk) show divergent binding properties due to their preferential localization in other complexes. H2A.Z also differentially regulates the interaction of the two DNA Topoisomerase II isoforms α and β (TOP2A and TOP2B) with nucleosomes. While TOP2A binds H2A.Z-containing nucleosomes, TOP2B binding is clearly hindered by H2A.Z. Proteins labelled twice with enumerated labels e.g. (1) and (2) correspond to multiple Uniprot identifiers mapped to the same gene name (e.g. SwissProt and TrEMBL identifiers), see also annotations in Supplementary Table 1. e, Schematic representation of the endogenous INO80B tagging strategy in MCF-7 cells. A gRNA was designed to cut in the 3′ UTR of the INO80B gene close to the stop codon (position −11). A single-stranded (ss) DNA oligonucleotide containing a TEV protease cleavage sequence (TEVcs) followed by the V5-tag sequence prior to the stop codon was used as homology donor. f, Workflow of the clonal MCF-7 cell line generation. Ribonucleoprotein (RNP) complexes were assembled from a two-piece gRNA (crRNA:tracrRNA duplex) and Cas9 protein and mixed with the ssDNA template. Cells were transfected with the RNP/ssDNA mixture and after 48 h seeded in 96 well plates with one cell per well. V5-positive clones were selected using immunocytochemistry (ICC) with anti-V5 antibodies. Note that the localization of the anti-V5-staining is nuclear as evidenced by the overlap with the DNA (DAPI) staining. Positive clones were expanded, characterized (see panel g), and used for further experiments. g, Immunoblot validation of the INO80B-V5 tagging. Nuclear extracts from three independently isolated V5-positive clonal MCF-7 cell lines used for the n = 3 INO80B-V5 IP-MS experiments shown in Fig. 5b,c were resolved by SDS PAGE and probed with anti-V5 antibodies to verify the endogenous tagging of INO80B with the V5 tag. Nuclear extracts from three independently isolated V5-negative cell lines (WT) were used as controls. h, TBRG1 co-purifies with the INO80 complex. Immunoblot analyses of n = 3 independent biological co-IP experiments of endogenously V5-tagged INO80B (INO80B-V5) using nuclear extracts prepared from the three clonal MCF-7 knock-in cell lines shown in panel g. INO80B was immunoprecipitated via the C-terminal V5-tag. TBRG1 co-purifies with INO80B along with the INO80 core subunit. The panel shows all three replicates, see Fig. 5b,c for the mass spectrometric quantification. WT indicates the three V5-negative MCF-7 cell lines shown in panel g that were used as negative controls for the V5 immunoprecipitation. i, Ethidium bromide-stained agarose gel showing DNA isolated from two independently prepared MNAse-digested HeLa chromatin samples used as input for the n = 2 replicates of the native ChIP-MS analysis of mono-nucleosomes co-purified with the GFP-tagged INO80 subunit ACTR5 shown in panel k and Fig. 5f. Shown are the soluble fraction used as input for the ChIP-MS and the undigested chromatin remaining in the pellet for both replicates of the preparation. Both replicates yielded similar results. j, Immunoblot analysis of native anti-GFP ChIPs using MNAse digested chromatin from WT HeLa cells and HeLa cells expressing the GFP-tagged INO80 subunit ACTR5. Purified proteins and co-purified mono-nucleosomes were released from GFP-Trap beads using 3 C protease, resolved by SDS-PAGE, transferred to nitrocellulose membranes and probed with specified antibodies. Note that ACTR5-GFP co-purifies the INO80 complex and histone proteins (indicated by co-IP of INO80B and H2B), which are absent in the control purifications from WT HeLa cells, verifying high specificity of the ChIP procedure. Experiments were repeated in n = 2 biologically independent replicates with similar results. The panel shows the results from both replicates. k, Extracted ion chromatograms for the H4K5acK8acK12acK16ac histone peptide (H4-4ac) in the input and ACTR5-GFP (INO80) ChIP-MS nucleosome co-purification samples (top panel), and representative annotated MS2 spectrum of the H4K5acK8acK12acK16ac peptide (bottom panel). The top panel shows the results for the H4-4ac peptide in both replicates of the ACTR5-GFP ChIP-MS displayed in Fig. 5f. l, Integrative analysis of ENCODE NGS data for the K562 cell line in relation to H3K9ac and H2A.Z genomic distributions. Data representation as in Extended Data Fig. 4c. Proteins strongly recruited by H2A.Z based on their MARCS feature estimates are highlighted in red, INO80 subunits are highlighted in bold. INO80 subunits are among the top-scoring proteins whose genomic distribution can be explained by both H2A.Z and H3K9ac.
Extended Data Fig. 6
Extended Data Fig. 6. ChIP-MS profiling of H3K4me1- and H3K4me3-associated chromatin proteins in IMR-90 cells.
a, Clustered heatmap of log2 FC (fold change) values for the relative abundances of histone PTMs measured by LC-MS in methyl state-specific anti-H3K4me1 and anti-H3K4me3 ChIP experiments and control anti-H3 and anti-H4 nucleosome purifications (each performed in n = 3 biologically independent experiments) as compared to the mean of three input chromatin samples (see b). Note, that in order to improve the identification of H3K4 methylation state-specific chromatin-associated proteins, the anti-H3 and anti-H4 control ChIPs were performed using the same inputs that had first been used for H3K4me1 and H3K4me3 ChIPs, and were therefore partially depleted in these modifications and proteins associated with H3K4 methylated nucleosomes (see online methods for details). b, Ethidium bromide-stained agarose gel showing DNA isolated from n = 3 independently prepared dual-crosslinked IMR-90 chromatin samples solubilised and fragmented by sonication that were used as inputs for the three replicates of the anti-H3K4me1 and anti-H3K4me3 ChIP-MS experiments shown in panels a and e-h. Note that most DNA fragments range between 100-200 bp in size, corresponding to mono-nucleosomes. c, Mean relative abundances of different H3K4 methylation states in ChIP purifications and in the input chromatin from n = 3 independent experiments (see panels a and b). d, Comparison of ChIP-MS profiling of H3K4me1- and H3K4me3-associated proteins with MARCS feature effect estimates. The heatmap depicts the log2 difference in the imputed ChIP-MS log2 FC estimates (H3K4me3 vs. H3K4me1, H3K4me3 vs. control, or H3K4me1 vs. control) for proteins strongly recruited or excluded by a given MARCS feature to the imputed log2 FC estimates of all other proteins detected in both MARCS and ChIP-MS data. Note that proteins that are predicted to be recruited by H3K4me3 in MARCS are statistically enriched in H3K4me3 but not in H3K4me1 ChIP purifications. e, ChIP-MS analysis of proteins associated with H3K4me1- and H3K4me3-modified chromatin in crosslinked IMR-90 cells. Log2 FC in normalized protein abundances over mean H3 and H4 ChIP controls for H3K4me3 and H3K4me1 ChIPs (n = 3 biologically independent experiments each) are plotted on the x and y axes, respectively. Differentially abundant proteins (H3K4me1 vs. H3K4me3; limma, two-sided, Benjamini/Hochberg-adjusted FDR ≤ 0.05) are circled with grey border. The area ± 0.2 radians away from the dotted x = y line is shaded in grey. Proteins strongly recruited or excluded by H3K4me3 in MARCS data are displayed in red and blue respectively, and core histone proteins (normalization controls) in dark grey. Smaller datapoints indicate response estimates based on single data points. Triangles indicate points outside of the data axes. Note that the vast majority of differentially abundant proteins preferentially associate with H3K4me3-modified chromatin while only few proteins show preferential association with H3K4me1. f, Heatmap of log2 FC in the normalized protein abundances for the specified ChIP-MS experiments as compared to the mean of the control anti-H3 and anti-H4 ChIPs, ordered from most to least enriched in the H3K4me3 ChIP. The column on the left shows the log2 FC in the mean normalized protein abundances in H3K4me3 vs. mean anti-H3 and anti-H4 control ChIPs. g, Heatmap of log2 FC in the normalized protein abundances for the specified ChIP-MS experiments as compared to the mean of the control anti-H3 and anti-H4 ChIPs, ordered from most to least enriched in H3K4me1 ChIP. The column on the left shows the log2 FC in the mean normalized protein abundances in H3K4me1 vs. mean anti-H3 and anti-H4 control ChIPs. h, Heatmap depicting differentially abundant proteins (H3K4me3 ChIP vs. H3K4me1 ChIP) ordered by the log2(H3K4me3/H3K4me1) FC estimate from the most enriched in H3K4me3 ChIP to most enriched in H3K4me1 ChIP. Proteins that are more abundant in the H3K4me3 ChIP as compared to the H3K4me1 ChIP are marked in red, while proteins more abundant in H3K4me1 ChIP are marked in blue (left colour axis). Log2 FC in the normalized protein abundance for the specified ChIP experiments vs. mean of H3K4me1 and H4K4me3 ChIPs is plotted in the heatmap on the right. Note that the vast majority of differentially abundant proteins are specifically enriched in the H3K4me3 ChIP while only few proteins are enriched in the H3K4me1 ChIP.
Extended Data Fig. 7
Extended Data Fig. 7. Network training procedure and inferred network.
a, Schematic of the network inference process. A set of candidate protein interaction networks was generated using published network inference algorithms. The networks were evaluated against BioGRID as a reference database of known interactions. The best performing network algorithm was selected based on the highest partial area under PRC curve (auPRC, see Supplementary Information). Network estimates at different confidence levels were generated and investigated for chromatin interactions. b, Partial PRC curves of the estimated protein-protein interaction networks. Six different network algorithms were trained and tested against BioGRID as a reference set of known protein interactions. Performance of the network inference was benchmarked by scoring the number of recovered BioGRID interactions. As our experiment was not expected to recover the whole of BioGRID, the networks were evaluated by partial area under precision and recall curve (auPRC) at a 20% sensitivity threshold. At this threshold the CLR network algorithm, which uses mutual information (MI), produced the network with the highest area under the curve. Five parameter thresholds were selected to generate networks at increasing stringency, out of which q = 0.001 (marked ** in the plot) forms the basis of Fig. 3b and panel e, and a high-confidence network at 70% precision (marked *) is displayed in Extended Data Fig. 8. c, Estimated interaction scores broken down by number of publications reporting the interaction in BioGRID. Data is depicted as standard boxplots, with the boxes ranging from the first (Q1) to the third (Q3) quartile and the median (Q2) indicated. The lower whiskers are at the lowest datum above Q1 – 1.5 x (Q3 – Q1), and the upper whiskers at the highest datum below Q3 + 1.5 x (Q3 – Q1). Data beyond whiskers are considered outliers and plotted as individual data points. N indicates the number of pairwise interactions between proteins (i.e. potential edges in the network) reported in BioGRID in each of the different publication-count categories. Note that interactions reported in the literature more frequently receive higher median interaction scores (see also Supplementary Table 7). d, Estimated interaction scores broken down by experimental method by which they were identified. Data depicted as in panel c. N indicates the number of pairwise interactions between proteins (i.e. potential edges in the network) reported in BioGRID for each specified experimental method (see also Supplementary Table 7). e, Network generated from the SNAP binding data using the CLR algorithm at a stringency threshold of q = 0.001. Key chromatin regulatory complexes form clusters in the network, see also Supplementary Table 8. A zoomable version is provided in the MARCS online interface. f, Integrative analysis of the MARCS network protein-protein interaction (PPI) predictions and ENCODE ChIP-seq datasets for the K562 cell line. The predicted interactions of proteins within the MARCS PPI network for which ChIP-seq data was available were stratified into bins of increasing confidence (x axis). For each of the stratified interactions, the distribution of symmetrically normalized MI coefficient estimates (see online methods) are shown in the violin plots (Y axis). The boxplots inside the violins are depicted as in panel c, but without any outliers shown. N indicates the number of pairwise interactions between proteins in the different confidence categories, with n(+) in the right panel indicating predicted interactions reported in BioGRID (blue) and n(−) indicating predicted interactions not reported in BioGRID (red). Note that as the confidence from MARCS increases, the normalized MI estimate increases as well (left panel, q ≤ 0.05 vs. Other: p-value = 4.037 × 10−7/U-statistic = 5.229 × 106, q ≤ 0.01 vs. Other: p-value = 4.167 × 10−7/U-statistic = 3.193 × 106, q ≤ 0.001 vs. Other: p-value = 1.161 × 10−1/U-statistic =1.260 × 106, q ≤ 0.0001 vs. Other: p-value = 3.932 × 10−10/U-statistic = 1.584 × 106, high-confidence vs. Other: p-value = 7.875 × 10−13/U-statistic = 1.414 × 106, Mann-Whitney U test, one-sided, Bonferroni-corrected), validating MARCS results. In addition to this, the similar conclusion holds when considering only the interactions that were not known at the network training time (right panel, red - ‘Not in BioGRID’ category, q ≤ 0.05 vs. Other: p-value = 3.868 × 10−5/U-statistic = 4.327 × 106, q ≤ 0.01 vs. Other: p-value = 5.843 × 10−5/U-statistic 2.474 × 106, q ≤ 0.001 vs. Other: p-value = 7.096 × 10−1/U-statistic = 8.232 × 105, q ≤ 0.0001 vs. Other: p-value = 9.474 × 10−5/U-statistic = 8.437 × 105, high-confidence vs. Other: p-value = 2.621 × 10−3/U-statistic = 2.346 × 105, Mann-Whitney U test, one-sided, Bonferroni-corrected). In both panels ****, ***, **, * indicate p ≤ 0.0001, p ≤ 0.001, p ≤ 0.01, and p ≤ 0.05 (respectively), ns = not significant.
Extended Data Fig. 8
Extended Data Fig. 8. High-confidence protein interaction predictions from MARCS data.
A plot of high-confidence protein interactions predicted by our network using the CLR-MI algorithm at an increased stringency of 70% precision. In this subnetwork, 30% of predicted edges were not previously deposited to BioGRID and therefore constitute potential novel interactions. Since increased precision is met with reduced recall, the network was augmented with edges linking interactions deposited to BioGRID but not recovered at this threshold (i.e. false negatives). Blue edges highlight predicted and known interactions reported in BioGRID. Potentially novel interactions predicted by our network that are not in BioGRID (at the time of network training) are highlighted in red, while interactions reported in BioGRID that did not pass the high-confidence threshold (i.e. false negatives) are indicated by grey lines. These annotations were added in the interest of organizing the network into connected sub-modules so the context of predictions can be interpreted more readily. Subunits of known protein complexes are circled and annotated with the complex name in bold letters.
Extended Data Fig. 9
Extended Data Fig. 9. Probing the effect of di-nucleosome linker DNA on protein engagement with heterochromatin and promoter chromatin states.
a, Clustered heatmap depicting protein binding responses to di-nucleosomes incorporating different combinations of H3K9me3 or H3K27me3 and linker lengths ranging from 35 bp to 55 bp with 5 bp increments. Data shown as log2 FC in the normalized protein abundances compared to unmodified di-nucleosomes with 50 bp control linker. Clusters 1 and 4 mark the H3K27me3- and H3K9me3-responsive proteins, respectively; these proteins are insensitive to variations in the linker length (see panels b and e). Clusters 6 and 7 mark proteins that respond with diminished binding to variations of the linker, independent of the modifications on the flanking nucleosomes. See also panels h-k and n. b, Comparison of protein binding responses to H3K9me3 on di-nucleosomes with 50 bp and 55 bp linkers. Data representation in b-m as in Fig. 4c. c, Comparison of protein binding responses to promoter PTMs present on di-nucleosomes with 50 bp and 200 bp SV40 promoter sequence-based linkers. d, Comparison of protein binding responses to promoter PTMs and the 200 bp SV40 promoter linker. Note that the binding responses mediated by promoter PTMs and the 200 bp SV40 promoter linker are largely independent of each other, as most proteins responding to one of the two features do not respond to the other and vice versa. e, Comparison of protein binding responses to H3K27me3 on di-nucleosomes with 50 bp and 55 bp linkers. f, Comparison of protein binding responses to promoter PTMs present on di-nucleosomes with 50 bp and 200 bp scrambled DNA sequence-based linkers. g, Comparison of protein binding responses to promoter PTMs and the 200 bp scrambled DNA linker. Note that the binding responses mediated by promoter PTMs and the 200 bp scrambled DNA linker are largely independent of each other, as most proteins responding to one of the two features do not respond to the other and vice versa. h, Comparison of protein binding responses to the 35 bp linker in relation to the 50 bp linker in unmodified and H3K9me3-decorated di-nucleosomes. See also panel n. i, Comparison of protein binding responses to the 35 bp linker in relation to the 50 bp linker in unmodified and H3K27me3-decorated di-nucleosomes. See also panel n. j, Comparison of protein binding responses to the 55 bp linker in relation to the 50 bp linker in unmodified and H3K9me3-decorated di-nucleosomes. k, Comparison of protein binding responses to the 55 bp linker in relation to the 50 bp linker in unmodified and H3K27me3-decorated di-nucleosomes. l, Comparison of protein binding responses to the 200 bp SV40 promoter linker in relation to the 50 bp linker in unmodified di-nucleosomes and di-nucleosomes decorated with promoter PTMs. m, Comparison of protein binding responses to the 200 bp scrambled DNA linker in relation to the 50 bp linker in unmodified di-nucleosomes and di-nucleosomes decorated with promoter PTMs. n, Schematic representation of the di-nucleosome linker DNAs ranging from 35 bp to 55 bp that were used in Fig. 4c,d and panels b, e, and h-k. Note that due to the design of the 5 bp increments in the linker DNA the 55 bp, 50 bp, and 45 bp linkers contain an AP-1 binding motif that is disrupted in the 40 bp and 35 bp linkers (see also Supplementary Information), resulting in impaired binding of several AP-1 family TFs, including FOS, FOSL2, JUN and JUNB to di-nucleosomes with the 40 bp and 35 bp linker DNAs. These are identified as cluster 7 in panel a and in the comparisons in panels h-i. o, Gene ontology enrichment analysis of proteins showing impaired binding to di-nucleosomes incorporating 200 bp long linker DNAs (200 bp SV40 promoter and scrambled DNA linkers). See cluster 4 in Fig. 4b.
Extended Data Fig. 10
Extended Data Fig. 10. Probing the effect of linker DNA on protein engagement with the enhancer chromatin state.
a, Clustered heatmap depicting protein binding responses to di-nucleosomes incorporating different combinations of 200 bp scrambled DNA or SV40 enhancer sequence-based linkers and enhancer-associated PTMs (H3K4me1 and H3K4me1K27ac). Data shown as log2 FC in the normalized protein abundances compared to unmodified di-nucleosomes with a 50 bp linker. b, Comparison of protein binding responses to H3K4me1 on di-nucleosomes with 50 bp and 200 bp SV40 enhancer sequence-based linkers. Data representation in b-g as in Fig. 4c. Note that only one protein shows a statistically significant binding response to H3K4me1 regardless of which linker is used, indicating that H3K4me1 has limited regulatory potential even when placed on nucleosomes flanking an NDR containing an enhancer DNA sequence. c, Comparison of protein binding responses to H3K4me1K27ac on di-nucleosomes with 50 bp and 200 bp SV40 enhancer linkers. Note that binding of only few proteins is stimulated by H3K4me1K27ac regardless of which linker is used, indicating that H3K4me1K27ac has limited potential in mediating protein recruitment to chromatin even when placed on nucleosomes flanking an NDR containing an enhancer DNA sequence. d, Comparison of sequence-specific protein binding responses to the 200 bp SV40 enhancer linker in unmodified di-nucleosomes and di-nucleosome decorated with H3K4me1. e, Comparison of sequence-specific protein binding responses to the 200 bp SV40 enhancer linker in unmodified di-nucleosomes and di-nucleosome decorated with H3K4me1K27ac. f, Comparison of protein binding responses to 200 bp SV40 enhancer linker and H3K4me1. Note that proteins responsive to the SV40 enhancer linker show no major regulation by H3K4me1, and vice versa. g, Comparison of protein binding responses to 200 bp SV40 enhancer linker and H3K4me1K27ac. Note that proteins responsive to the SV40 enhancer linker show no major regulation by H3K4me1K27ac, and vice versa. h, Gene ontology enrichment analysis of proteins that show either enhanced or impaired binding to di-nucleosomes with the 200 bp linker (clusters 1 and 2 in panel a).

References

    1. Kundaje A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330. doi: 10.1038/nature14248. - DOI - PMC - PubMed
    1. Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 2010;28:817–825. doi: 10.1038/nbt.1662. - DOI - PMC - PubMed
    1. Musselman CA, Lalonde M-E, Côté J, Kutateladze TG. Perceiving the epigenetic landscape through histone readers. Nat. Struct. Mol. Biol. 2012;19:1218–1227. doi: 10.1038/nsmb.2436. - DOI - PMC - PubMed
    1. Bannister AJ, Kouzarides T. Regulation of chromatin by histone modifications. Cell Res. 2011;21:381–395. doi: 10.1038/cr.2011.22. - DOI - PMC - PubMed
    1. Greenberg MVC, Bourc’his D. The diverse roles of DNA methylation in mammalian development and disease. Nat. Rev. Mol. Cell Biol. 2019;20:590–607. doi: 10.1038/s41580-019-0159-6. - DOI - PubMed

MeSH terms

LinkOut - more resources