Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data
- PMID: 27992111
- DOI: 10.1002/cyto.a.23030
Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data
Abstract
Recent technological developments in high-dimensional flow cytometry and mass cytometry (CyTOF) have made it possible to detect expression levels of dozens of protein markers in thousands of cells per second, allowing cell populations to be characterized in unprecedented detail. Traditional data analysis by "manual gating" can be inefficient and unreliable in these high-dimensional settings, which has led to the development of a large number of automated analysis methods. Methods designed for unsupervised analysis use specialized clustering algorithms to detect and define cell populations for further downstream analysis. Here, we have performed an up-to-date, extensible performance comparison of clustering methods for high-dimensional flow and mass cytometry data. We evaluated methods using several publicly available data sets from experiments in immunology, containing both major and rare cell populations, with cell population identities from expert manual gating as the reference standard. Several methods performed well, including FlowSOM, X-shift, PhenoGraph, Rclusterpp, and flowMeans. Among these, FlowSOM had extremely fast runtimes, making this method well-suited for interactive, exploratory analysis of large, high-dimensional data sets on a standard laptop or desktop computer. These results extend previously published comparisons by focusing on high-dimensional data and including new methods developed for CyTOF data. R scripts to reproduce all analyses are available from GitHub (https://github.com/lmweber/cytometry-clustering-comparison), and pre-processed data files are available from FlowRepository (FR-FCM-ZZPH), allowing our comparisons to be extended to include new clustering methods and reference data sets. © 2016 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of ISAC.
Keywords: CyTOF; F1 score; bioinformatics; cell populations; clustering; flow cytometry; high-dimensional; manual gating; mass cytometry; single-cell.
© 2016 The Authors. Cytometry Part A Published by Wiley Periodicals, Inc. on behalf of ISAC.
Similar articles
-
Comprehensive evaluation and practical guideline of gating methods for high-dimensional cytometry data: manual gating, unsupervised clustering, and auto-gating.Brief Bioinform. 2024 Nov 22;26(1):bbae633. doi: 10.1093/bib/bbae633. Brief Bioinform. 2024. PMID: 39656848 Free PMC article.
-
A computational approach for phenotypic comparisons of cell populations in high-dimensional cytometry data.Methods. 2018 Jan 1;132:66-75. doi: 10.1016/j.ymeth.2017.09.005. Epub 2017 Sep 14. Methods. 2018. PMID: 28917725
-
immunoClust--An automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets.Cytometry A. 2015 Jul;87(7):603-15. doi: 10.1002/cyto.a.22626. Epub 2015 Apr 7. Cytometry A. 2015. PMID: 25850678
-
The end of gating? An introduction to automated analysis of high dimensional cytometry data.Eur J Immunol. 2016 Jan;46(1):34-43. doi: 10.1002/eji.201545774. Epub 2015 Nov 30. Eur J Immunol. 2016. PMID: 26548301 Review.
-
Analyzing high-dimensional cytometry data using FlowSOM.Nat Protoc. 2021 Aug;16(8):3775-3801. doi: 10.1038/s41596-021-00550-0. Epub 2021 Jun 25. Nat Protoc. 2021. PMID: 34172973 Review.
Cited by
-
Unsupervised machine learning reveals key immune cell subsets in COVID-19, rhinovirus infection, and cancer therapy.bioRxiv [Preprint]. 2020 Nov 4:2020.07.31.190454. doi: 10.1101/2020.07.31.190454. bioRxiv. 2020. Update in: Elife. 2021 Aug 05;10:e64653. doi: 10.7554/eLife.64653. PMID: 32766581 Free PMC article. Updated. Preprint.
-
A standardized immune phenotyping and automated data analysis platform for multicenter biomarker studies.JCI Insight. 2018 Dec 6;3(23):e121867. doi: 10.1172/jci.insight.121867. JCI Insight. 2018. PMID: 30518691 Free PMC article.
-
Compositional Data Analysis using Kernels in mass cytometry data.Bioinform Adv. 2022 Feb 11;2(1):vbac003. doi: 10.1093/bioadv/vbac003. eCollection 2022. Bioinform Adv. 2022. PMID: 35224501 Free PMC article.
-
A novel data fusion method for the effective analysis of multiple panels of flow cytometry data.Sci Rep. 2019 May 1;9(1):6777. doi: 10.1038/s41598-019-43166-x. Sci Rep. 2019. PMID: 31043667 Free PMC article.
-
beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types.PLoS Comput Biol. 2018 May 3;14(5):e1006135. doi: 10.1371/journal.pcbi.1006135. eCollection 2018 May. PLoS Comput Biol. 2018. PMID: 29723188 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources