Solo: Doublet Identification in Single-Cell RNA-Seq via Semi-Supervised Deep Learning
- PMID: 32592658
- DOI: 10.1016/j.cels.2020.05.010
Solo: Doublet Identification in Single-Cell RNA-Seq via Semi-Supervised Deep Learning
Abstract
Single-cell RNA sequencing (scRNA-seq) measurements of gene expression enable an unprecedented high-resolution view into cellular state. However, current methods often result in two or more cells that share the same cell-identifying barcode; these "doublets" violate the fundamental premise of single-cell technology and can lead to incorrect inferences. Here, we describe Solo, a semi-supervised deep learning approach that identifies doublets with greater accuracy than existing methods. Solo embeds cells unsupervised using a variational autoencoder and then appends a feed-forward neural network layer to the encoder to form a supervised classifier. We train this classifier to distinguish simulated doublets from the observed data. Solo can be applied in combination with experimental doublet detection methods to further purify scRNA-seq data to true single cells. It is freely available from https://github.com/calico/solo. A record of this paper's transparent peer review process is included in the Supplemental Information.
Keywords: deep learning; doublet; semi-supervised learning; single-cell RNA-seq.
Copyright © 2020 The Authors. Published by Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of Interests N.B., N.F., I.L., M.R., D.G.H., and D.R.K. are employed by Calico Life Sciences.
MeSH terms
LinkOut - more resources
Molecular Biology Databases