Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 15:27:451-460.
doi: 10.1016/j.csbj.2025.01.009. eCollection 2025.

Improving doublet cell removal efficiency through multiple algorithm runs

Affiliations

Improving doublet cell removal efficiency through multiple algorithm runs

Yong She et al. Comput Struct Biotechnol J. .

Abstract

Doublets are a key confounding factor in the analysis of scRNA-seq data, as they can interfere with differential expression analysis and disrupt developmental trajectories. However, due to the randomness of the algorithms, most doublet removal methods still leave a certain proportion of doublets after application. In this study, we proposed a multi-round doublet removal (MRDR) strategy, that ran the algorithm in cycles multiple times to effectively reduce randomness while enhancing the effectiveness of doublet removal. We evaluated the MRDR strategy in 14 real-world datasets, 29 barcoded scRNA-seq datasets, and 106 synthetic datasets with four popular doublet detection tools, including DoubletFinder, cxds, bcds, and hybrid. We found that in real-world datasets, the DoubletFinder had a better performance in MRDR strategy compared to a single removal of doublets and the recall rate improved by 50 % for two rounds of doublet removal compared to one round, and the performance of the other three doublet algorithms improved the ROC by about 0.04. In barcoded scRNA-seq datasets, we found that using cxds for two rounds of doublet removal yielded the best results. Subsequently, in simulated datasets, we proved that the multi-round removal strategy was more effective in removing doublets than a single removal, with cxds showing the best results when applied twice, and the ROC of the four methods during the two rounds of removal improved by at least 0.05 compared to single removal. Finally, compared to running the algorithm once, we found that the MRDR strategy was more beneficial for differential gene expression analysis and cell trajectory inference when using default analysis parameters. Overall, we proved that the MRDR strategy was more effective in removing doublets and advantageous for downstream analyses, and the strategy could be incorporated into the standard analysis pipeline for scRNA-seq experiments and recommend using cxds to remove doublets through two rounds of algorithm iteration.

Keywords: Doublet removal; Multi-round doublet removal strategy; Single-cell RNA sequencing; Synthetic dataset.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Residual doublets were a widespread phenomenon. (A) Results of one and two removal using DoubletFinder in datasets with true doublet labels.
Fig. 2
Fig. 2
Evaluation of the MRDR strategy using 14 Benchmark real scRNA-seq datasets. (A) Comparison of AUROC values from one removal to 10 doublet removals and hierarchical removals using the DoubletFinder. (B) Comparison of AUPRC values from one removal to 10 removals and hierarchical removal using the DoubletFinder. (C) Comparison of recall rates from one removal to 10 removals and hierarchical removal using the DoubletFinder. (D) Comparison of precision rates from one removal to 10 doublet removals and hierarchical removal using the DoubletFinder (E) Comparison of AUROC for one removal and two removals of the other three methods.
Fig. 3
Fig. 3
Evaluation of the MRDR strategy using 29 barcode scRNA-seq datasets. (A) Each barcode scRNA-seq dataset constructed doublets with a doublet rate of 0.05. (B) Each barcode scRNA-seq dataset constructed doublets with a doublet rate of 0.1.
Fig. 4
Fig. 4
Evaluation of MRDR strategy using 100 synthetic scRNA-seq datasets based on four simulation settings. (A) The AUROC values of MRDR strategy for each doublet detection method were evaluated across four distinct simulation settings: varying doublet rates (from 1 % to 30 % with a step size of 1 %), varying sequencing depths (from 500 to 15,000 UMI counts per cell with a step size of 500 counts), varying numbers of cell types (from 2 to 20 with a step size of 1), and 21 heterogeneity levels which specified the degree of differentiation of genes between the two cell types. (B)The AUPRC values of MRDR strategy for each doublet detection method were evaluated across four distinct simulation settings: varying doublet rates (from 1 % to 30 % with a step size of 1 %), varying sequencing depths (from 500 to 15,000 UMI counts per cell with a step size of 500 counts), varying numbers of cell types (from 2 to 20 with a step size of 1), and 21 heterogeneity levels which specified the degree of differentiation of genes between the two cell types.
Fig. 5
Fig. 5
MRDR strategy influenced the computation of differentially expressed genes and the inference of cell trajectory. (A) Comparison of precision of differentially expressed genes after removing doublets using MRDR strategy across four distinct doublet rates. (B) Trajectories constructed by Slingshot after MRDR strategy were applied to remove identified doublets in a synthetic dataset with 7 %. The true cell trajectory was branched. (C) Trajectories constructed by Slingshot after MRDR strategy were applied to remove identified doublets in a synthetic dataset with 12 %. The true cell trajectory was branched.

References

    1. Tang X., Huang Y., Lei J., Luo H., Zhu X. The single-cell sequencing: new developments and medical applications. Cell Biosci. 2019;9:53. - PMC - PubMed
    1. Haque A., Engel J., Teichmann S.A., Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med. 2017;9:75. - PMC - PubMed
    1. Hwang B., Lee J.H., Bang D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med. 2018;50:1–14. - PMC - PubMed
    1. McGinnis C.S., Murrow L.M., Gartner Z.J. DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors. Cell Syst. 2019;8:329–337.e4. - PMC - PubMed
    1. Bais A.S., Kostka D. scds: computational annotation of doublets in single-cell RNA sequencing data. Bioinformatics. 2020;36:1150–1158. - PMC - PubMed

LinkOut - more resources