Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 24;1(1):None.
doi: 10.1016/j.crmeth.2021.100008.

Double-jeopardy: scRNA-seq doublet/multiplet detection using multi-omic profiling

Affiliations

Double-jeopardy: scRNA-seq doublet/multiplet detection using multi-omic profiling

Bo Sun et al. Cell Rep Methods. .

Abstract

The computational detection and exclusion of cellular doublets and/or multiplets is a cornerstone for the identification the true biological signals from single-cell RNA sequencing (scRNA-seq) data. Current methods do not sensitively identify both heterotypic and homotypic doublets and/or multiplets. Here, we describe a machine learning approach for doublet/multiplet detection utilizing VDJ-seq and/or CITE-seq data to predict their presence based on transcriptional features associated with identified hybrid droplets. This approach highlights the utility of leveraging multi-omic single-cell information for the generation of high-quality datasets. Our method has high sensitivity and specificity in inflammatory-cell-dominant scRNA-seq samples, thus presenting a powerful approach to ensuring high-quality scRNA-seq data.

Keywords: ADT; B cell receptor; CITE-seq; T cell receptor; doublets; multi-omics profiling; single-cell transcriptomics.

PubMed Disclaimer

Conflict of interest statement

R.J.M.B.-R. is a co-founder of Alchemab Therapeutics Ltd and consultant for Alchemab Therapeutics Ltd and GSK. F.A.T. is a consultant for Alchemab Therapeutics Ltd.

Figures

None
Graphical abstract
Figure 1
Figure 1
Multi-omics aids the identification of doublets and/or multiplets (A) Schematic of approach to identify scRNA-seq doublets and/or multiplets using the CITE-seq and VDJ modalities. Droplets with a transcriptome resembling non-B or non-T cells that captured BCR or TCR sequences, respectively, were considered as potential doublets and/or multiplets. (B) Uniform manifold approximation and projection (UMAP) dimensionality reduction of three healthy PBMC datasets colored by cell type. (C) Examples of the CITE-seq levels between CD3, CD4, and CD8 for the three individuals for the B cell cluster, with the red lines corresponding to the CITE-seq positivity thresholds. (D) UMAP dimensionality reduction of three healthy PBMCs colored by VDJ doublets (left, co-capture of discordant VDJs) or CITE-seq doublets (right, co-capture of the corresponding mutually exclusive CITE-seq pair). Homotypic doublets were defined as those containing multiple BCRs or multiple TCRs, and the heterotypic doublets are defined as the remainder (droplets containing both BCR(s) and TCR(s) or droplets containing BCRs or TCRs that do not have transcriptional profiles that resemble B or T cells, respectively). (E) Generalized additive models fitted on percentage mitochondrial genes versus percentage ribosomal genes, R2 values shown top right; from the HEK293 dataset. (F) Mito-ribo ratio values per enriched HEK293 populations. p values were calculated across groups by using Wilcoxon test with Bonferroni correction for pairwise comparisons. (G) The relative numbers of genes (nGenes), number of RNA molecules (nUMI), mito-ribo ratio (mitoribo_ratio), and nUMIs_VDJ per droplet for the VDJ-identified doublets and/or multiplets (left) and the CITE-seq identified doublets and/or multiplets (right). “Other” refers to droplets that were not identified as doublets and/or multiplets from the VDJ-seq or CITE-seq data. The p values of the differences between the feature distributions of the doublet/multiplets detected and the remainder of the droplets provided (two-sided Wilcoxon test). ∗∗∗∗p < 0.00005 (Wilcoxon test). Abbreviation is as follows: CLR, centered log-ratio transformed.
Figure 2
Figure 2
Machine learning applied to doublets and/or multiplet training data captures both homotypic and heterotypic doublets and/or multiplet (A) Schematic of the MLtuplet applied to the healthy PBMC data. (B) UMAP dimensionality reductions of three healthy PBMCs colored by MLtiplet-predicted singlets and training and predicted doublets across the VDJ-seq, CITE-seq, DoubletFinder training sets, and a training set combining all three approaches. (C) The proportion of VDJ and CITE-seq identified (true) doublets and/or multiplets that were identified as doublets by MLtiplet and DoubletFinder, grouped by doublet type. The doublets highlighted in orange are homotypic doublets (comprised of multiple cells of similar transcriptional types, namely B or T cells). ∗p < 0.05 (two-way Wilcox test, corrected for multiple testing). p values are calculated between the MLtiplet-predicted doublets and/or multiplets and singlets by using two-sided Wilcoxon test, and ∗p < 0.05.
Figure 3
Figure 3
MLtiplet is both sensitive and specific on simulated doublets datasets, and scales with doublet proportion (A) Schematic of the comparison of the doublet detection methods using simulated data. (B) The estimated proportion of doublets across the simulated datasets using either DoubletFinder or the classifier based on VDJ-identified doublets, CITE-seq identified doublets, or both. The black line corresponds to y = x. (C) The percentages of doublets identified by MLtiplet per cell type across the different simulated datasets. The point shapes correspond to the simulated dataset for which the percentage of true doublets was either 1%, 2%, 5%, 10%, or 15%. The homotypic doublets are highlighted in orange.
Figure 4
Figure 4
Validation of MLtiplet on an NSCLC tumor dataset Doublet detection on a non-small cell lung cancer (NSCLC) dataset. (A–C) Shown are the (A) schematic of the training datasets for doublet/multiplet prediction by MLtiplet. UMAP plots of (B) the annotated cell types and (C) VDJ-seq heterotypic doublets. (D) Venn diagram showing the numbers of droplets used as the combined identified doublets and/or multiples using both DoubletFinder and VDJ-seq (green), and the predicted doublets and/or multiplets from MLtiplet using the DoubletFinder-derived training dataset (blue), VDJ-seq-derived training dataset (orange), and DoubletFinder plus VDJ-seq-derived training dataset (pink). (E) UMAP plots of the training and predicted doublets and/or multiples using each approach. (F) The relative numbers of RNA molecules (nUMI) and mito-ribo ratio (mitoribo_ratio) per cell for the VDJ-identified doublets and/or multiplets, CITE-seq-identified doublets and/or multiplets, MLtiplet-predicted doublets and/or multiplets, and the remainder (predicted singlets by MLtiplet).

References

    1. Ahmed R., Omidian Z., Giwa A., Cornwell B., Majety N., Bell D.R., Lee S., Zhang H., Michels A., Desiderio S., et al. A public BCR present in a unique dual-receptor-expressing lymphocyte from type 1 diabetes patients encodes a potent T cell autoantigen. Cell. 2019;177:1583–1599.e16. - PMC - PubMed
    1. Albert B., Kos-Braun I.C., Henras A.K., Dez C., Rueda M.P., Zhang X., Gadal O., Kos M., Shore D. A ribosome assembly stress response regulates transcription to maintain proteome homeostasis. eLife. 2019;8:e45002. - PMC - PubMed
    1. Azizi E., Carr A.J., Plitas G., Cornish A.E., Konopacki C., Prabhakaran S., Nainys J., Wu K., Kiseliovas V., Setty M., et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell. 2018;174:1293–1308.e36. - PMC - PubMed
    1. Bais A.S., Kostka D. scds: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics. 2020;36:1150–1158. - PMC - PubMed
    1. Barreto V., Cumano A. Frequency and characterization of phenotypic Ig heavy chain allelically included IgM-expressing B cells in mice. J. Immunol. 2000;164:893–899. - PubMed

Publication types

LinkOut - more resources