Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 22;20(1):63.
doi: 10.1186/s13059-019-1662-y.

EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data

Affiliations

EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data

Aaron T L Lun et al. Genome Biol. .

Abstract

Droplet-based single-cell RNA sequencing protocols have dramatically increased the throughput of single-cell transcriptomics studies. A key computational challenge when processing these data is to distinguish libraries for real cells from empty droplets. Here, we describe a new statistical method for calling cells from droplet-based data, based on detecting significant deviations from the expression profile of the ambient solution. Using simulations, we demonstrate that EmptyDrops has greater power than existing approaches while controlling the false discovery rate among detected cells. Our method also retains distinct cell types that would have been discarded by existing methods in several real data sets.

Keywords: Cell detection; Droplet-based protocols; Empty droplets; Single-cell transcriptomics.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Cell-calling results from different algorithms in simulations based on the PBMC dataset. Simulation scenarios are labelled as G1/G2 where G1 and G2 are the number of barcodes in the group of large and small cells, respectively. a The recall for each method, defined as the proportion of detected cells from each group. EmptyDrops was used with an FDR threshold of 0.1%. b The observed FDR in the set of libraries detected by EmptyDrops at a range of nominal FDR thresholds (dotted lines), defined as the proportion of detected droplets that are empty. In both plots, each point represents the result of one simulation iteration, the bar represents the mean value across 10 iterations, and the error bars represent the standard error of the mean
Fig. 2
Fig. 2
Application of EmptyDrops and other cell detection methods to one sample of the placenta dataset. a A barcode rank plot showing the fitted spline used for knee point detection in EmptyDrops. The detected knee and inflection points are also shown. b The negative log-likelihood for each barcode in the multinomial model of EmptyDrops, plotted against the total count. Barcodes detected as putative cell-containing droplets at an FDR of 0.1% are labelled in red. Only barcodes with tb>T are shown. c An UpSet plot [20] of the barcodes detected by each combination of methods (vertical bars). Horizontal bars represent the number of barcodes detected by each method. d Histogram outlines of the log-total count for barcodes detected by each method
Fig. 3
Fig. 3
t-SNE plots for the placenta dataset (a, b) or the 900 neuron dataset (c, d), constructed from barcodes that were detected with EmptyDrops and/or CellRanger. Each point represents a barcode and is colored based on a, c whether it was detected as a cell with each method; b the expression of monocyte marker genes KCNA5, CFP, STX11, and S100A12; or d the expression of interneuron marker genes Gad1, Gad2, and Sla6c1. Expression of the relevant marker set in each barcode was quantified as the sum of the normalized log-expression values across all marker genes. Arrows mark the putative monocyte and interneuron populations in each dataset

Similar articles

Cited by

References

    1. Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, Kamitaki N, Martersteck EM, Trombetta JJ, Weitz DA, Sanes JR, Shalek AK, Regev A, McCarroll SA. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015;161(5):1202–14. doi: 10.1016/j.cell.2015.05.002. - DOI - PMC - PubMed
    1. Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz DA, Kirschner MW. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 2015;161(5):1187–201. doi: 10.1016/j.cell.2015.04.044. - DOI - PMC - PubMed
    1. Zheng GX, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, Ziraldo SB, Wheeler TD, McDermott GP, Zhu J, Gregory MT, Shuga J, Montesclaros L, Underwood JG, Masquelier DA, Nishimura SY, Schnall-Levin M, Wyatt PW, Hindson CM, Bharadwaj R, Wong A, Ness KD, Beppu LW, Deeg HJ, McFarland C, Loeb KR, Valente WJ, Ericson NG, Stevens EA, Radich JP, Mikkelsen TS, Hindson BJ, Bielas JH. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8:14049. doi: 10.1038/ncomms14049. - DOI - PMC - PubMed
    1. Islam S, Zeisel A, Joost S, La Manno G, Zajac P, Kasper M, Lonnerberg P, Linnarsson S. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods. 2014;11(2):163–6. doi: 10.1038/nmeth.2772. - DOI - PubMed
    1. Picelli S, Bjorklund AK, Faridani OR, Sagasser S, Winberg G, Sandberg R. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods. 2013;10(11):1096–8. doi: 10.1038/nmeth.2639. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources