The Wild West of spike-in normalization

Lauren A Patel^{1

2

3}, Yuwei Cao^{3

4}, Eric M Mendenhall⁵, Christopher Benner⁶, Alon Goren⁷

Affiliations

¹ Department of Bioengineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
² Department of Medicine, Division of Endocrinology & Metabolism, University of California San Diego, La Jolla, CA, USA.
³ Department of Medicine, Division of Genomics & Precision Medicine, University of California San Diego, La Jolla, CA, USA.
⁴ Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA.
⁵ HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
⁶ Department of Medicine, Division of Endocrinology & Metabolism, University of California San Diego, La Jolla, CA, USA. cbenner@health.ucsd.edu.
⁷ Department of Medicine, Division of Genomics & Precision Medicine, University of California San Diego, La Jolla, CA, USA. agoren@ucsd.edu.

PMID: 39271835
PMCID: PMC12266361
DOI: 10.1038/s41587-024-02377-y

The Wild West of spike-in normalization

Lauren A Patel et al. Nat Biotechnol. 2024 Sep.

. 2024 Sep;42(9):1343-1349.

doi: 10.1038/s41587-024-02377-y.

Authors

Lauren A Patel^{1

2

3}, Yuwei Cao^{3

4}, Eric M Mendenhall⁵, Christopher Benner⁶, Alon Goren⁷

Affiliations

¹ Department of Bioengineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.
² Department of Medicine, Division of Endocrinology & Metabolism, University of California San Diego, La Jolla, CA, USA.
³ Department of Medicine, Division of Genomics & Precision Medicine, University of California San Diego, La Jolla, CA, USA.
⁴ Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA.
⁵ HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
⁶ Department of Medicine, Division of Endocrinology & Metabolism, University of California San Diego, La Jolla, CA, USA. cbenner@health.ucsd.edu.
⁷ Department of Medicine, Division of Genomics & Precision Medicine, University of California San Diego, La Jolla, CA, USA. agoren@ucsd.edu.

PMID: 39271835
PMCID: PMC12266361
DOI: 10.1038/s41587-024-02377-y

Abstract

Spike-in normalization is a powerful approach to assess global changes in data obtained from genomic mapping of DNA-associated proteins by methods such as ChIP-sequencing (ChIP-seq)^, or CUT&RUN. While multiple spike-in methods provide detailed documentation, the implementation of these approaches often omit critical quality control steps and veer from the established procedures. Spike-in normalization typically makes use of a single scalar to normalize genome-wide data, making the approach particularly vulnerable to errors in implementation. Here, we show that proper application of spike-in normalization can increase quantification accuracy across a spectrum of conditions and outline how misuse of spike-in approaches can create erroneous biological interpretations. We conclude by providing guidelines to minimize pitfalls when applying this approach to normalize data from protein-DNA interaction results.

PubMed Disclaimer

Figures

**Fig. 1.. Demonstration of the ability of spike-in normalization to accurately capture signal variation over wide and narrow dynamic ranges.**
**(a)** Reanalysis of data from Orlando et al., where untreated cells (high H3K79me2 signal) were mixed with DOT1 inhibited Jurkat cells (low H3K79me2 signal) in 5 ratios of treated/untreated (0/100, 25/75, 50/50, 75/25, 100/0). Dynamic range of H3K79me2 signal was measured by using Western blot results from Orlando et al. with ImageJ; the change in H3K79me2 between 0% treated and 100% treated samples was approximately 10-fold (Supplemental Fig. 1a). Maximum signal (10X) and minimum signal (1X) are labeled on x axes. The H3K79me2 ChIP-seq signal was quantified and plotted against the line of expected signal either by using standard read depth normalization (left) or spike-in normalization (right). Accuracy of the fit of the expected line was determined by R squared (the value is reported in top right of each plot). **(b)** The results of a similar titration experiment generated to focus on a narrower dynamic range. Mitotic-arrested cells were generated by treating with thymidine then S-trityl-cysteine (STC), hereafter termed “mitotic”. We estimate approximately 85-90% of cells arrested in prometaphase by this method. From previous mass-spectrometry data, the approximate fold change between mitotic H3K9ac (low H3K9ac signal) and interphase H3K9ac (high H3K9ac signal) was 3x, labeled on x axes (Supplemental Fig. 1b). We used interphase cells (high H3K9ac) mixed with mitotic-arrested cells (low H3K9ac) in six ratios (100/0, 95/5, 75/25, 50/50, 25/75, 0/100). To each of the samples, we spiked-in both *D. melanogaster* and *S. cerevisiae* chromatin. Quantification of the H3K9ac signal after read-depth normalization (left) or spike-in normalization using *Drosophila* (right) was plotted as in (a). The normalization using *S. cerevisiae* provided similar results (Supplemental Fig. 3). Within each plot, H3K79me3 or H3K9ac signal was min-max normalized according to the following equation: $z_{i} = (x_{i} - \min (x)) ∕ (\max (x) - \min (x))$ . Here, $\min (x)$ is the average minimum signal (100% treated cells in (a) or 100% mitotic cells in (b)), $\max (x)$ is the average maximum signal (0% treated cells in (a) or 0% mitotic cells in (b)). $X_{i}$ is the signal for each sample and $z_{i}$ is the minmax normalized signal for each sample. QCs for the dataset are in Supplemental Fig. 2-6 and Supplemental Table 3.

**Figure 2.. A schematic depicting the impact of misuse of spike-in normalization on downstream results.**
Left, “True Enrichment” shows a hypothetical scenario of two conditions with variable histone acetylation profiles. Panels **a-e** show various scenarios observed in literature and their effect on both local signal (depicted as schematics of individual ChIP-seq peaks) and global signal (shown as a scatterplot of log-normalized signal for all peaks for each condition). The data represents 3 replicates, while for simplicity, the peaks and scatter plots show only the average of the replicates. Of note – these scenarios are not mutually exclusive, and combinations of these misuse of spike-in normalization can occur. (a) Traditional ChIP-seq with no spike-in added – the signal appears identical between the two conditions, not capturing the true patterns. (b) Properly performed normalization with spike-in ChIP (blue) yields results that accurately follow the original ground truth. (c) Variable proportion of spike-in added to the samples, this skews the analysis and condition #2 appears to wrongly have higher signal. (d) Low yield of spike-in data, either from input chromatin levels or technical issues. This precludes any QC of the spike-in. Also, the low number of reads that align to the spike-in genome could highly skew the results one way or the other (depicted as overlaying normalized peaks and two plots in both directions). (e) Exogenous chromatin from a species which is phylogenetically close to the sample species. This could skew the results in both directions, depending on the percent of misassigned reads in each sample (depicted as in (d)).

**Figure 3.. Variations in the ratios of spike-in to sample chromatin in public datasets.**
We examined 53 datasets. 51 of these were obtained from our survey. We additionally included the data we generated for Fig. 1b and Supplemental Fig. 2 (n=2). Of these only 27 had sufficient input samples for each condition (52.9%, pie chart; insert). One additional dataset plotted had > 1 input but was still missing inputs for some conditions. We aligned the inputs to a concatenated spike-in and target genomes (Methods). For each dataset, the input with lowest ratio of spike-in/sample reads was scaled down to 1, to capture the variation between input samples in the same dataset. Y-axis is shown on log2 scale to capture the diversity in variation within datasets. Alignment information for each individual sample is available in Supplemental Table 1, GEO accessions for each dataset are in Supplemental Table 5.

**Figure 4.. An example of the ability of proper normalization strategy to correct for variations in the spike-in/sample ratio.**
Mitotically arrested HeLa-S3 cells were treated with either the pan-HDAC inhibitor TSA (to increase global acetylation) or DMSO as a control. For each treatment, using the same chromatin preparation the amount of added spike-in was varied over five orders of magnitude (0.00025 to 2.5x spike-in/target ratio). The data were normalized by three approaches: **(a)** read-depth normalization – the change in the signal between the conditions cannot be observed; **(b)** normalization of relative IP ratios of spike-in and target – most of the change in the signal is not observable and the amount of spike-in added impacts the general signal; and **(c)** account for the ratio of spike-in/target by using the non-IP input controls and then normalize as in (b) using the relative spike-in/target IP – the known differences between the conditions can be detected in all cases but the one where the amount of spike-in is higher than the target. QCs for the dataset are in Supplemental Fig. 8-14 and Supplemental Table 4.

See this image and copyright information in PMC

References

1. Orlando DA; Chen MW; Brown VE; Solanki S; Choi YJ; Olson ER; Fritz CC; Bradner JE; Guenther MG Quantitative ChIP-Seq Normalization Reveals Global Modulation of the Epigenome. Cell Reports 2014, 9 (3), 1163–1170. 10.1016/j.celrep.2014.10.018. - DOI - PubMed
1. Bonhoure N; Bounova G; Bernasconi D; Praz V; Lammers F; Canella D; Willis IM; Herr W; Hernandez N; Delorenzi M; Hernandez N; Delorenzi M; Deplancke B; Desvergne B; Guex N; Herr W; Naef F; Rougemont J; Schibler U; Andersin T; Cousin P; Gilardi F; Gos P; Lammers F; Raghav S; Villeneuve D; Fabbretti R; Vlegel V; Xenarios I; Migliavacca E; Praz V; David F; Jarosz Y; Kuznetsov D; Liechti R; Martin O; Delafontaine J; Cajan J; Gustafson K; Krier I; Leleu M; Molina N; Naldi A; Rib L; Symul L; Bounova G Quantifying ChIP-Seq Data: A Spiking Method Providing an Internal Reference for Sample-to-Sample Normalization. Genome Res 2014, 24 (7), 1157–1168. 10.1101/gr.168260.113. - DOI - PMC - PubMed
1. Meers MP; Bryson TD; Henikoff JG; Henikoff S Improved CUT&RUN Chromatin Profiling Tools. eLife 2019, 8, e46314. 10.7554/eLife.46314. - DOI - PMC - PubMed
1. Chen K; Hu Z; Xia Z; Zhao D; Li W; Tyler JK The Overlooked Fact: Fundamental Need for Spike-In Control for Virtually All Genome-Wide Analyses. Molecular and Cellular Biology 2016, 36 (5), 662–667. 10.1128/MCB.00970-14. - DOI - PMC - PubMed
1. Jiang L; Schlesinger F; Davis CA; Zhang Y; Li R; Salit M; Gingeras TR; Oliver B Synthetic Spike-in Standards for RNA-Seq Experiments. Genome Res. 2011, 21 (9), 1543–1551. 10.1101/gr.121095.111. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Medical
- MedlinePlus Health Information
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Wild West of spike-in normalization

Affiliations

The Wild West of spike-in normalization

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Molecular Biology Databases