Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 27;2(12):100372.
doi: 10.1016/j.patter.2021.100372. eCollection 2021 Dec 10.

New interpretable machine-learning method for single-cell data reveals correlates of clinical response to cancer immunotherapy

Affiliations

New interpretable machine-learning method for single-cell data reveals correlates of clinical response to cancer immunotherapy

Evan Greene et al. Patterns (N Y). .

Abstract

We introduce a new method for single-cell cytometry studies, FAUST, which performs unbiased cell population discovery and annotation. FAUST processes experimental data on a per-sample basis and returns biologically interpretable cell phenotypes, making it well suited for the analysis of complex datasets. We provide simulation studies that compare FAUST with existing methodology, exemplifying its strength. We apply FAUST to data from a Merkel cell carcinoma anti-PD-1 trial and discover pre-treatment effector memory T cell correlates of outcome co-expressing PD-1, HLA-DR, and CD28. Using FAUST, we then validate these correlates in cryopreserved peripheral blood mononuclear cell samples from the same study, as well as an independent CyTOF dataset from a published metastatic melanoma trial. Finally, we show how FAUST's phenotypes can be used to perform cross-study data integration in the presence of diverse staining panels. Together, these results establish FAUST as a powerful new approach for unbiased discovery in single-cell cytometry.

Keywords: algorithms; bioinformatics; cancer; immunology; single-cell; statistics & probability.

PubMed Disclaimer

Conflict of interest statement

A patent for the application of the FAUST algorithm to cytometry datasets has been applied for on behalf of the Fred Hutchinson Cancer Research Center. The research described in this paper was completed while E.G. was conducting research and working at the Fred Hutchinson Cancer Research Center. E.G. declares ownership interest in Ozette Technologies and was an employee of Ozette Technologies when the manuscript was revised to respond to peer review. G.F. has received consulting income from Takeda and research support from Janssen Pharmaceuticals and declares ownership interest in Ozette Technologies. R.G. has received consulting income from Juno Therapeutics, Takeda, Infotech Soft, and Celgene, has received research support from Janssen Pharmaceuticals and Juno Therapeutics, and declares ownership in Ozette Technologies and Modulus Therapeutics. Trial funds for CITN-07 were in part provided by Celldex. Trial funds for CITN-09 were in part provided by Merck.

Figures

Figure 1
Figure 1
FAUST overview (A) Samples with markers M1 to M8 are grouped into experimental units (EUs) that are concatenated for analysis. (B) FAUST exhaustively explores the space of “reasonable” 3-marker gating strategies for each EU to compute an annotation forest. (C) Using this, FAUST scores each marker in each EU and selects consistently high-scoring markers for continued analysis (M5, M6 are removed here). Thresholds are standardized across EUs for selected markers: if a selected marker has EUs in which the number of estimated thresholds does not agree with the standard, thresholds are either removed (M1, EU2) or imputed (M2, EU2; M7, EU1; M8, EU3) using information from EUs adhering to the standard (denoted by the red arrows). (D) Discovery forests are then grown for each EU. Each leaf of each tree corresponds to a phenotype. All phenotypes are scored across forests, high-scoring phenotypes are selected (leaf nodes without a red ×; starred nodes subsequently survive down-selection in F), and low-scoring phenotypes are discarded (leaf nodes with red ×). (E) Selected phenotypes are annotated using the standardized thresholds from (C). (F) Phenotypes are down-selected based on frequency of occurrence across EUs. (G) A per-sample count matrix is derived for down-selected phenotypes.
Figure 2
Figure 2
Simulation studies (A) Left: median estimated number of clusters by method, 5 simulation iterations for each truth value, multivariate Gaussian setting. Right: median adjusted Rand index (ARI) by method, 5 simulation iterations for each truth value, multivariate Gaussian setting. (B) Left: cross-validated AUC of the top cluster for each of 25 iterations, multivariate Gaussian setting, points jittered by a maximum of 0.0125. Right: median ARI between a method's top cluster and the simulated true predictor, 10 iterations per expected fold change, multivariate Gaussian setting. (C and D) All panels report results from applied methods to simulated datasets transformed by Γ(1+|x/4|). (C) Left: same as (A, left). Right: same as (A, right). (D) Left: same as (B, left). Right: same as (B, right). Horizontal dashed red line at 0.90.
Figure 3
Figure 3
FAUST annotations enable novel embeddings that reflect expression differences not captured by direct dimensionality reduction UMAP embedding of the observed expression matrix colored by: (A) the expression for the stated marker winsorized at the 1st and 99th percentile, and scaled to the unit interval; (B) the associated per-cell FAUST annotation; (C) all selected FAUST phenotypes; (D) significant FAUST phenotypes. UMAP embedding of the annotation transformed expression matrix with (E) colored as (A), (F) colored as (B), (G) colored as (C), and (H) colored as (D). The red bounding box in (G) and (H) contains the four significant correlates discovered in the FAUST analysis. The inset in (H) is the entire embedding plot; the main component is zoomed into the bounding box to show the relative placement of the four correlates on the annotation embedding.
Figure 4
Figure 4
FAUST CD8+ phenotypes are associated with positive response to anti-PD-1 therapy in virus-positive subjects (A–C) (A) The two CD8 FAUST phenotypes significantly associated with positive treatment outcome, stratified by viral status. Observed p values contrasting all responders (n = 18) against all non-responders (n = 9) are reported in the figure. Frequencies of the CD8+ phenotypes relative to total CD3+ cells versus (B) total PD-1 expression measured by IHC from tumor biopsies as described in Giraldo et al. (C) Productive clonality (1 − normalized entropy) from tumor samples as described in Miler et al. A suggestive trend is observed in both (B) and (C) among virus-positive subjects, although strong conclusions are not warranted due to the small sample size. (D) Targeted frequencies relative to total CD3+ cells in the cryopreserved PBMC samples (MCC validation) with observed p value contrasting responders against non-responders in virus-positive subjects, and the CyTOF melanoma dataset with observed p value.
Figure 5
Figure 5
Longitudinal profiles of aggregated FAUST cell populations in a pembrolizumab therapy trial and an FLT3-L + CDX-1401 trial are consistent with underlying technical and biological signals (A) The aggregate frequency of all phenotypes discovered by FAUST containing the subphenotype CD8+ PD-1bright CD3+ CD4 across all time points. Aggregation occurs within subject by time point. (B) The longitudinal profiles of all cell subpopulations with phenotypes consistent with the DC compartment: CD19, CD3, CD56, HLA-DR+, CD14, CD16, and CD11C+/−. Light-colored lines show individual subjects. The dark line shows the median across subjects over time. Error bars show the 95% confidence intervals of median estimate at each time point. Cohort 1, n = 16 subjects; cohort 2, n = 16 subjects.
Figure 6
Figure 6
FAUST phenotypes enable cross-study meta-analysis of datasets stained with disparate marker panels (A) Forest plots displaying one-sided 95% confidence intervals (CIs) for increased abundance of CD3+ CD4 CD8+ PD-1dim/bright phenotypes (CD8 compartment) and CD3+ CD4bright CD8 PD-1dim/bright phenotypes (CD4 compartment) in the MCC trial T cell panel. (B) Forest plots displaying one-sided 95% CIs for increased abundance of CD14+ CD16 HLA-DR+/bright CD3 CD56 CD19 phenotypes in responders versus non-responders for three trials. Each panel shows CIs derived from fitting the univariate model to each FAUST phenotype consistent with the target, the 95% CI arising from the targeted approach, and the 95% CI derived by fitting a PFDA model jointly to the FAUST phenotypes and then testing for increased abundance using model coefficients.

References

    1. Grégori G., Patsekin V., Rajwa B., Jones J., Ragheb K., Holdman C., Robinson J.P. Hyperspectral cytometry at the single-cell level using a 32-channel photodetector. Cytometry Part A. 2012;81:35–44. - PubMed
    1. Saeys Y., Van Gassen S., Lambrecht B.N. Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat. Rev. Immunol. 2016;16:449. - PubMed
    1. Aghaeepour N., Finak G., Consortium T.F., Consortium T.D., Hoos H., Mosmann T.R., Brinkman R., Gottardo R., Scheuermann R.H. Critical assessment of automated flow cytometry data analysis techniques. Nat. Methods. 2013;10:228. - PMC - PubMed
    1. Arvaniti E., Claassen M. Sensitive detection of rare disease-associated cell subsets via representation learning. Nat. Commun. 2017;8:14825. - PMC - PubMed
    1. Bruggner R.V., Bodenmiller B., Dill D.L., Tibshirani R.J., Nolan G.P. Automated identification of stratifying signatures in cellular subpopulations. Proc. Natl. Acad. Sci. U S A. 2014;111:E2770–E2777. - PMC - PubMed