Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 29;26(7):107242.
doi: 10.1016/j.isci.2023.107242. eCollection 2023 Jul 21.

A contamination focused approach for optimizing the single-cell RNA-seq experiment

Affiliations

A contamination focused approach for optimizing the single-cell RNA-seq experiment

Deronisha Arceneaux et al. iScience. .

Abstract

Droplet-based single-cell RNA-seq (scRNA-seq) data are plagued by ambient contaminations caused by nucleic acid material released by dead and dying cells. This material is mixed into the buffer and is co-encapsulated with cells, leading to a lower signal-to-noise ratio. Although there exist computational methods to remove ambient contaminations post-hoc, the reliability of algorithms in generating high-quality data from low-quality sources remains uncertain. Here, we assess data quality before data filtering by a set of quantitative, contamination-based metrics that assess data quality more effectively than standard metrics. Through a series of controlled experiments, we report improvements that can minimize ambient contamination outside of tissue dissociation, via cell fixation, improved cell loading, microfluidic dilution, and nuclei versus cell preparation; many of these parameters are inaccessible on commercial platforms. We provide end-users with insights on factors that can guide their decision-making regarding optimizations that minimize ambient contamination, and metrics to assess data quality.

Keywords: Biology experimental methods; Computational bioinformatics; Transcriptomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

None
Graphical abstract
Figure 1
Figure 1
Ambient contamination metrics robustly reflect data quality on simulated datasets (A) Scaled cumulative total transcript counts over ranked barcodes by total transcript counts for datasets simulated with (top) low ambient level and (bottom) high ambient level. Secant lines from the curve to the diagonal line are colored in gray with the line with maximal secant line colored in green, which were used to calculate inverted maximal secant distance and secant line standard deviation. The area under curve (colored in orange) and the minimal rectangle circumscribing (dashed purple line) were used to calculate the inverted AUC percentage. (B) Scaled representation of the slope distribution histograms shown in Figures S1I and S1J for (top) low and (bottom) high ambient datasets shown in A. The x axis values are midpoint of each bin in the slope distribution histogram, and the y axis values are multiplication product of the bin midpoint values and the bin heights. The region representing slopes that were below the threshold were considered as empty droplets and were colored in blue. The sum of these datapoints is quantified as empty droplets' scaled slope sum. (C) Distribution of dropout rate of genes ranked by ascending dropout rate for datasets simulated with (top) low and (bottom) high ambient level. The pink line is drawn at 2% dropout rate, the cut-off below which a gene will be defined as ambient. (D) Distribution of percentage of ambient genes expressed per cell for dataset simulated with (top) low and (bottom) high ambient level. The mean percentage is quantified. The AmbiQuant overall score is labeled in red. (E–H) (E) Maximal secant distance (green) and secant line standard deviation (yellow), (F) AUC percentage, (G) cell’s scaled slope sum, and (H) percent counts ambient over different ambient levels for simulations. Line plots shown as mean ± stdev of n = 1000 replicates for each ambient level.
Figure 2
Figure 2
Contamination metrics on experimental datasets inform data quality on a continuous scale Ambient contamination plots and metrics, formatted similarly to Figure 1 of experimental datasets of different quality: (A–D) K562 (Sample 1) cell line, (E–H) mouse gastric corpus, (I–L) and mouse colonic crypts.
Figure 3
Figure 3
Pre-encapsulation variables affect scRNA-seq data quality and cell type diversity (A) Live hopper visualization of (left) viable single cells and (right) dying cells. (B and C) Quantification of (B) AmbiQuant overall score, (C) percent counts ambient comparing near QC failure runs (MACs enzyme on minced tissue, cold protease on minced tissue, MACs enzyme on minced, and Collagenase/DNase on Crypts) and cold protease dissociation on crypts. Mean with SEM as error bars for n = 3 or 4 samples. ∗∗p < 0.01 by t-test. (D) UMAP embedding of filtered cells from (blue) TrypLE and (orange) cold protease datasets. Expression of Dclk1, a tuft cell marker, and Ptprc, an immune cell marker, were overlaid. (E and F) UMAP overlay with percent counts ambient or Muc2 expression for (E) unfixed cells or (F) fixed cells prepared with cold protease dissociation on crypts. Secretory (red) and absorptive (green) lineages are outlined. Gene expression values on scale bars are Z-scores of normalized values described in STAR Methods. (G) Live hopper visualization of (left) unfixed cells and (right) cells fixed with 0.1 X DSP.
Figure 4
Figure 4
Microfluidic manipulations can affect cell death and subsequent ambient contamination in downstream data (A) Schematic of standard loading (top) and tip loading (bottom). (B) Live hopper visualization of viable single cells from tip loading apparatus. (C and D) Quantification of (C) AmbiQuant overall score, (D) percent counts ambient comparing various microfluidics manipulations. Mean with SEM as error bars for n = 3 or 4 samples. ∗p < 0.05, ∗∗p < 0.01 by ANOVA followed Tukey post-test. (E and F) UMAP overlay with percent counts ambient or Muc2 expression for (E) tip loading or (F) standard loading. Secretory (red) and absorptive (green) lineages are outlined. Gene expression values on scale bars are Z-scores of normalized values described in STAR Methods. (G and H) Comparison of functional enrichment analysis datasets derived from tip loading (higher data quality) and standard loading (lower data quality) looking at (G) enteroendocrine (EE) and (H) Tuft (TUF) cells. (I) Schematic for standard inDrops chip (left), and All Cell chip (right).
Figure 5
Figure 5
Ambient contamination and quality control metrics reveal impact of intrinsic and extrinsic factors on data quality (A) Heatmap of ambient contamination and standard QC metric scores with HTAPP datasets as columns grouped by Leiden clusters. Metrics are shown as rows, where the first ten rows are individual metrics, whose colors correspond to the top left color bar. The last row is the AmbiQuant overall score for the ambient contamination metrics, colored in red corresponding to the bottom left color bar. The Leiden cluster labels and labels of isolation technique, technique and protocol combination, sample type, tissue origin, and cancer type are shown as color bars above the heatmap. Metric scores are normalized between 0 and 1 for each row for visualization. Abbreviations - cell: scRNA-seq; nuclei: snRNA-seq; BTD: brain tumor dissociation; C4: collagenase 4 and DNase I; LD: Liberase TM and DNase I; LE: Liberase TM, elastase and DNase I; Miltenyi Biotec human tumor dissociation; PDEC: pronase, dispase, elastase, collagenases A and 4 and DNase I; Paipan: (cysteine protease); cd45n: CD45+ depletion; CST: CHAPS with salts and Tris; EZ: EZPrep; NST: Nonidet P40 with salts and Tris; TST: Tween with salts and Tris; O-PDX1: orthotopic patient-derived xenograft; CLL: Chronic lymphocytic leukemia; MBC: Metastatic breast cancer; NB: Neuroblastoma; NSCLC: Non-small cell lung carcinoma. (B) Three-dimensional scatterplot of the first 3 principal components of the ambient contamination and standard QC metric score matrix colored by Leiden cluster labels. (C) Boxplot comparing the metric scores between single-cell and single-nucleus sequenced samples. Two-sided Mann-Whitney-Wilcoxon test performed between single-cell and single-nuclei groups. ∗∗p < 0.01, ∗∗∗p < 0.001,∗∗∗∗p < 0.0001.

References

    1. Picelli S., Björklund Å.K., Faridani O.R., Sagasser S., Winberg G., Sandberg R. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods. 2013;10:1096–1098. doi: 10.1038/nmeth.2639. - DOI - PubMed
    1. Hashimshony T., Wagner F., Sher N., Yanai I. CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep. 2012;2:666–673. doi: 10.1016/j.celrep.2012.08.003. - DOI - PubMed
    1. Gierahn T.M., Wadsworth M.H., Hughes T.K., Bryson B.D., Butler A., Satija R., Fortune S., Love J.C., Shalek A.K. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods. 2017;14:395–398. doi: 10.1038/nmeth.4179. - DOI - PMC - PubMed
    1. Wu A.R., Neff N.F., Kalisky T., Dalerba P., Treutlein B., Rothenberg M.E., Mburu F.M., Mantalas G.L., Sim S., Clarke M.F., et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat. Methods. 2014;11:41–46. doi: 10.1038/nmeth.2694. - DOI - PMC - PubMed
    1. Clark I.C., Fontanez K.M., Meltzer R.H., Xue Y., Hayford C., May-Zhang A., D’Amato C., Osman A., Zhang J.Q., Hettige P., et al. Microfluidics-free single-cell genomics with templated emulsification. Nat. Biotechnol. 2023 doi: 10.1038/s41587-023-01685-z. - DOI - PMC - PubMed