Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009:2009:247646.
doi: 10.1155/2009/247646. Epub 2009 Nov 12.

Merging mixture components for cell population identification in flow cytometry

Affiliations

Merging mixture components for cell population identification in flow cytometry

Greg Finak et al. Adv Bioinformatics. 2009.

Abstract

We present a framework for the identification of cell subpopulations in flow cytometry data based on merging mixture components using the flowClust methodology. We show that the cluster merging algorithm under our framework improves model fit and provides a better estimate of the number of distinct cell subpopulations than either Gaussian mixture models or flowClust, especially for complicated flow cytometry data distributions. Our framework allows the automated selection of the number of distinct cell subpopulations and we are able to identify cases where the algorithm fails, thus making it suitable for application in a high throughput FCM analysis pipeline. Furthermore, we demonstrate a method for summarizing complex merged cell subpopulations in a simple manner that integrates with the existing flowClust framework and enables downstream data analysis. We demonstrate the performance of our framework on simulated and real FCM data. The software is available in the flowMerge package through the Bioconductor project.

PubMed Disclaimer

Figures

Figure 1
Figure 1
flowClustBIC, flowClustICL, flowMerge solutions for automated gating of forward versus side scatter across 137 clinical samples of CLL. The flowClustBIC fit: black solid curve. The flowClustICL fit: red dashed curve. The flowMerge fit: green dashed curve.
Figure 2
Figure 2
Examples of the flowClustBIC, flowClustICL, flowMerge cluster solutions for forward versus side scatter in a sample of CLL flow cytometry data. (a) The flowClustBIC solution with seven clusters. (b) The flowClustICL solution with two clusters. (c) The entropy versus number of clusters plot, fit to a two-component piecewise linear regression model. The best fitting model has a changepoint at three clusters. (d) The flowMerge solution corresponding to K = 3 clusters provides a better fit to the lymphocyte population than either the flowClustBIC or flowClustICL solutions and provides a good estimate of the true number of cell populations.
Figure 3
Figure 3
The number of clusters chosen by the flowClustBIC, flowClustICL, flowMerge, and GMMBIC solutions for automated gating of CD8, CD4, and CD7 across 137 samples of CLL. The flowClustBIC solution: solid black curve. The flowClustICL solution: dashed red curve. The flowMerge solution derived from the flowClustBIC solution: dashed green curve. The GMMBIC solution: dashed blue curve.
Figure 4
Figure 4
Example of flowClustICL, flowClustBIC, and flowMerge solutions fitted to a CLL sample in the CD8, CD4, and CD7 dimensions. (a) Three projections of the flowClustICL solution. (b) Three projections of the flowClustBIC solution. (c) Entropy versus number of clusters for a series of flowMerge model fits with a piecewise linear regression fitted to the data. The changepoint located at K = 5 clusters is selected automatically. (d) Three projections of flowMerge solution with K = 5 clusters derived from the flowClustBIC solution.
Figure 5
Figure 5
Detecting failed cluster merging. (a) Distribution of the entropy (normalized for the number of events and clusters) of the flowMerge solution for forward versus side scatter (left) and fluorescence channels (right) across 137 samples. (b) The relationship between the normalized entropy and the number of clusters in the flowMerge solution for forward scatter versus side scatter (left) and fluorescence channels (right). (c) Example of flowMerge solutions with unusually high normalized entropy from the right tail of the distribution for forward versus side scatter (left) and fluorescence (right). (d) A plot of the normalized entropy versus samples grouped by antibody labels identifies antibody combinations that are problematic for automated gating with the automated merging algorithm.
Figure 5
Figure 5
Detecting failed cluster merging. (a) Distribution of the entropy (normalized for the number of events and clusters) of the flowMerge solution for forward versus side scatter (left) and fluorescence channels (right) across 137 samples. (b) The relationship between the normalized entropy and the number of clusters in the flowMerge solution for forward scatter versus side scatter (left) and fluorescence channels (right). (c) Example of flowMerge solutions with unusually high normalized entropy from the right tail of the distribution for forward versus side scatter (left) and fluorescence (right). (d) A plot of the normalized entropy versus samples grouped by antibody labels identifies antibody combinations that are problematic for automated gating with the automated merging algorithm.
Figure 6
Figure 6
Simulation results for CD4 versus CD8 dimensions of a CLL sample. (a) The 2D kernel density estimate of the real CD4 versus CD8 data. Gates for the CD4+/CD8− , CD8+/CD4− , and CD4−/CD8− subpopulations are represented by light coloured lines. Events outside the gates are considered outliers. (b) An example of the kernel density estimate of simulated data drawn from the distribution defined by the real data. (c) The number of clusters selected by the flowMerge solution, the GMMBIC solution, the flowClustBIC, and flowClustICL solutions over 100 realizations of simulated data. (d) The median flowClustBIC flowClust solution with 9 components. (e) The median flowMerge solution with 5 components. (f) The misclassification rate (MCR) for the flowMergeK solution, the GMMK solution, and the flowClustK solution with the number of clusters fixed to the true number of cell subpopulations (K = 3). (g) The misclassification rates for the three components from the optimal GMMBIC, flowClustBIC, and flowMerge solutions minimizing the MCR. (h) A GMM, (i) flowClust, (j) and flowMergeK solution with a fixed number of clusters.
Figure 6
Figure 6
Simulation results for CD4 versus CD8 dimensions of a CLL sample. (a) The 2D kernel density estimate of the real CD4 versus CD8 data. Gates for the CD4+/CD8− , CD8+/CD4− , and CD4−/CD8− subpopulations are represented by light coloured lines. Events outside the gates are considered outliers. (b) An example of the kernel density estimate of simulated data drawn from the distribution defined by the real data. (c) The number of clusters selected by the flowMerge solution, the GMMBIC solution, the flowClustBIC, and flowClustICL solutions over 100 realizations of simulated data. (d) The median flowClustBIC flowClust solution with 9 components. (e) The median flowMerge solution with 5 components. (f) The misclassification rate (MCR) for the flowMergeK solution, the GMMK solution, and the flowClustK solution with the number of clusters fixed to the true number of cell subpopulations (K = 3). (g) The misclassification rates for the three components from the optimal GMMBIC, flowClustBIC, and flowMerge solutions minimizing the MCR. (h) A GMM, (i) flowClust, (j) and flowMergeK solution with a fixed number of clusters.

References

    1. Gratama JW, Kraan J, Keeney M, Granger V, Barnett D. Reduction of variation in T-cell subset enumeration among 55 laboratories using single-platform, three or four-color flow cytometry based on CD45 and SSC-based gating of lymphocytes. Clinical Cytometry. 2002;50(2):92–101. - PubMed
    1. Satoh C, Dan K, Yamashita T, Jo R, Tamura H, Ogata K. Flow cytometric parameters with little interexaminer variability for diagnosing low-grade myelodysplastic syndromes. Leukemia Research. 2008;32(5):699–707. - PubMed
    1. Van Blerk M, Bernier M, Bossuyt X, et al. National external quality assessment scheme for lymphocyte immunophenotyping in Belgium. Clinical Chemistry and Laboratory Medicine. 2003;41(3):323–330. - PubMed
    1. Achuthanandam R, Quinn J, Capocasale RJ, Bugelski PJ, Hrebien L, Kam M. Sequential univariate gating approach to study the effects of erythropoietin in murine bone marrow. Cytometry Part A. 2008;73(8):702–714. - PubMed
    1. Boedigheimer MJ, Ferbas J. Mixture modeling approach to flow cytometry data. Cytometry Part A. 2008;73(5):421–429. - PubMed

LinkOut - more resources