Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(7):e1003130.
doi: 10.1371/journal.pcbi.1003130. Epub 2013 Jul 11.

Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples

Affiliations

Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples

Andrew Cron et al. PLoS Comput Biol. 2013.

Abstract

Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1% or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest in an automated fashion. Two specific challenges for automated analysis are to detect extremely low frequency event subsets without biasing the estimate by pre-processing enrichment, and the ability to align cell subsets across multiple data samples for comparative analysis. In this manuscript, we develop hierarchical modeling extensions to the Dirichlet Process Gaussian Mixture Model (DPGMM) approach we have previously described for cell subset identification, and show that the hierarchical DPGMM (HDPGMM) naturally generates an aligned data model that captures both commonalities and variations across multiple samples. HDPGMM also increases the sensitivity to extremely low frequency events by sharing information across multiple samples analyzed simultaneously. We validate the accuracy and reproducibility of HDPGMM estimates of antigen-specific T cells on clinically relevant reference peripheral blood mononuclear cell (PBMC) samples with known frequencies of antigen-specific T cells. These cell samples take advantage of retrovirally TCR-transduced T cells spiked into autologous PBMC samples to give a defined number of antigen-specific T cells detectable by HLA-peptide multimer binding. We provide open source software that can take advantage of both multiple processors and GPU-acceleration to perform the numerically-demanding computations. We show that hierarchical modeling is a useful probabilistic approach that can provide a consistent labeling of cell subsets and increase the sensitivity of rare event detection in the context of quantifying antigen-specific immune responses.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Schematic summary illustrating the HDP model framework.
A graphical model provides a declarative representation of the HDPGMM. The figure shows a compact plate representation of the graphical model, in which plates (rounded rectangles) are used to group variables in a subgraph. Each subgraph in a plate is replicated a number of times as indicated by the label within the plate. The formula image event in the formula image sample is represented by formula image, and the formula image component for the formula image sample is a multivariate Gaussian with proportion formula image, mean formula image and covariance matrix formula image.s. Hyper-parameters that can be set are formula image, formula image, formula image, formula image, formula image and formula image as described in Methods. Given the declarative graphical model, standard and GPU-accelerated MCMC sampling algorithms can be used to implement the model as previously described .
Figure 2
Figure 2. HDPGMM results in more accurate classification of events in simulated data than other statistical mixture model approaches.
(Left) Row 1 shows independent fitting of DPGMMs to each data set; row 2 shows the use of reference posterior distribution from data set 3 to classify events in other data set; row 3 shows a DPGMM fitted to pooled data from all data sets; and row 4 shows fitting of an HDPGMM to all 4 data sets. Results are described in the text. Within each row, if two events are assigned to the same cluster, they are given the same color - it can be seen that clusters are aligned in Rows 2–4, but not in Row 1. All models used a truncated DPGMM base with 16 components, a burn-in of 10,000 iterations, and sampling of 100 post burn-in iterations for the calculation of the posterior distribution. (Right) Contour plots of the log posterior distribution. The HDPGMM distributions (Row 4) are most similar to the independently fitted distributions (Row 1), with the advantage that the small cluster in data set 3 masked by its larger neighboring cluster on top has a distinct mode. In contrast, the reference and pooled distribution strategies have the exact same distribution for all data sets and lack the flexibility to model sample-specific features.
Figure 3
Figure 3. Comparison of manual, DPGMM and HDPGMM detection of rare antigen-specific events.
The panels show the estimated frequencies of antigen-specific cells (large red dots) expressed as a percentage of all events (yellow boxes). These percentages were estimated using manual gating by a representative user (left), DPGMM (middle) and HDPGMM (right). Text in red in the first column shows the spiked-in frequency of retrovirally transduced T cells for the data sample in that row. The red polygons in the left panel are gates used for identifying antigen-specific cells by manual gating; the exact shape, sequence and location of these gates is determined by the operator and may vary between different operators depending on their training, experience and expertise. With the DPGMM approach, cell subsets across the samples from top to bottom are not directly comparable as indicated by the event colors, posing a problem for quantification of the same cell subset in different samples. In contrast, with the HDPGMM approach, cell subsets are aligned and directly comparable across all samples. HDPGMM is more sensitive at detecting antigen-specific cells when the frequency is extremely low (first 3 rows). HDPGMM is also more consistent in labeling events across different samples, while DPGMM is prone to detect likely false positive antigen-specific cells that are CD3-low or negative (arrows in rows 1 and 4 of middle panel). HDPGMM improves on the accuracy and consistency because the model incorporates both sample-specific and group-specific information, in contrast to DPGMM which only has access to sample-specific information. For both DPGMM and HDPGMM, model fitting was done with an MCMC sampler running 20,000 burn-in and 2,000 averaged iterations.
Figure 4
Figure 4. Trace plots of log likelihood for the final 2,000 MCMC iterations of the HDP model sampled every iterations showing mixing and convergence.
The MCMC was run for 22,000 iterations, and samples obtained from the final 2,000 iterations were used to calculate and plot the log likelihood at each iteration. The log likelihood appears to vary stochastically about an equilibrium distribution indicating convergence, and the chain traverses its distribution indicating mixing, but the steps tend to be small indicating some degree of autocorrelation. Text in yellow boxes indicates the frequencies of the spiked antigen-specific T cells in the sample being fitted.
Figure 5
Figure 5. Log of the proportion or weight of each of the 128 components in the model.
Each panel shows a scatter plot of the log component proportions ordered by size for the HDP model fitted to each flow cytometry data sample. The largest component has a log probability of approximately -1, indicating that this single component can account for about 10% of the total events in the data sample. In contrast, the smallest component has a log probability of between -5 and -6, indicating that the smallest component only accounts for 0.001–0.0001% of the total events in the data sample. Since each sample has 50,000 events, components with log probabilities of -5 and below are likely to be empty of events. Hence, the dip at the right of each plot is an indication of cutting back by the Dirichlet process model, and provides evidence that the number of components is adequate for a good model fit. If there is no dip in the size of smallest component proportions, there is a need to increase the maximal number of components if rare event clusters are to be adequately modeled. Text in yellow boxes indicates the frequencies of the spiked antigen-specific T cells in the sample being fitted.
Figure 6
Figure 6. Comparison of FLOCK, FLAME and flowClust for detection of rare antigen-specific events.
The panels show the estimated frequencies of antigen-specific cells (large red dots) expressed as a percentage of all events (yellow boxes). (Left panel) FLOCK detects the antigen-specific cluster at the highest spiked-in frequency but not in the other samples. There are several CD3-negative events included in the detected cluster that are most likely false positive events. As indicated by the color coding of events, FLOCK does not provide any alignment of cell subsets across samples. (Middle panel) Using the default settings, FLAME failed to identify any antigen-specific cell subsets. Cell subsets found were aligned but there were alignment artifacts when the event partitioning was different across samples (arrowed example). (Right panel) Using 64 components and 1000 iterations, flowClust only identified antigen-specific clusters at the highest spiked-in levels and did not provide any methods to align clusters across samples.
Figure 7
Figure 7. Comparison of accuracy, reproducibility and linearity of manual gating, DPGMM and HDPGMM.
For gating estimates, frequency estimates from 10 flow cytometry operators were collected. For both DPGMM and HDPGMM, 10 MCMC runs with unique random number seeds were performed to evaluate the reproducibility of antigen-specific cell frequency estimates. Estimates of the antigen-specific frequencies from manual, DPGMM and HDPGMM approaches are shown as open blue circles, with the blue bar representing the mean of all 10 estimates at each spike frequency. The red crosses represent the “true” frequency of antigen-specific cells combining the known spiked-in frequencies and the average background from 10 manual evaluations. As shown in the figure, HDPGMM (right panel) estimates have equal or less variability at every spike dilution when compared with DPGMM (middle panel). A linear regression fit (red line) shows that the standard errors and correlation coefficient of all 3 approaches are comparable. The number in red text above each set of estimates is the absolute value of (median of estimates – “true value”), a measure of accuracy. This shows that HDPGMM is more accurate than manual gating at every spiked-in concentration. The number in blue text below each set of estimates is the coefficient of variation (CV), which is lower for HDPGMM than manual gating for all concentrations except autologous sample only. For both DPGMM and HDPGMM, model fitting was done with an MCMC sampler running 20,000 burn-in and 2,000 averaged iterations.
Figure 8
Figure 8. Sensitivity analysis for the 4 configurable hyper-parameters
formula image , formula image , formula image and formula image . To evaluate the robustness of the algorithm to changes in the configurable hyper-parameters, we repeated the analysis of the spiked in data sample multiple times with different parameters, using 10 independent MCMC runs to obtain statistics for each set of hyper-parameter configurations. Each mini-panel has the same axes as Figure 7 with estimated frequency of multimer-positive events on the vertical axis and spiked-in frequency on the horizontal axis. A boxplot is used to display the results for each model configuration. Configurable parameters were set to be either the default value (1.0), 3-fold lower (0.3) or 3-fold higher (3.0), giving 81 hyper-parameter configurations. Three replicate runs with 10,000 burn-in and 1,000 MCMC iterations were performed for each configuration. The default configuration is in the center panel with red text.
Figure 9
Figure 9. Performance of HDPGMM with different numbers of events, samples and markers.
Left panel shows time taken to fit HDPGMM to 10 samples with 50,000 to 500,000 events and 10 markers. Middle panel shows time taken to fit HDPGMM to 3 to 30 samples each with 100,000 events and 10 markers. Right panel shows time taken to fit 10 samples each with 100,000 events with the number of markers varying from 5 to 15. In each case, the model was run for 1,000 MCMC steps with an upper limit of 128 mixture components on a NVidia GTX 580 GPU, and the times from three replicate runs are shown.

References

    1. Britten C, Gouttefangeas C, Welters M, Pawelec G, Koch S, et al. (2008) The CIMT-monitoring panel: a two-step approach to harmonize the enumeration of antigen-specific CD8+ T lymphocytes by structural and functional assays. Cancer Immunology, Immunotherapy 57: 289–302. - PMC - PubMed
    1. Welters MJP, Gouttefangeas C, Ramwadhdoebe TH, Letsch A, Ottensmeier CH, et al. (2012) Harmonization of the intracellular cytokine staining assay. Cancer Immunology, Immunotherapy 61: 1–12. - PMC - PubMed
    1. Aghaeepour N, Finak G, Hoos H, Mosmann TR, Brinkman R, et al. (2013) Critical assessment of automated ow cytometry data analysis techniques. Nature methods 10: 228–238. - PMC - PubMed
    1. Chan C, Feng F, Ottinger J, Foster D, West M, et al. (2008) Statistical mixture modeling for cell subtype identification in ow cytometry. Cytometry Part A 73A: 693–701. - PMC - PubMed
    1. Pyne S, Hu X, Wang K, Rossin E, Lin T, et al. (2009) Automated high-dimensional ow cytometric data analysis. Proceedings of the National Academy of Sciences 106: 8519. - PMC - PubMed

Publication types