Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 13;2(1):vbac028.
doi: 10.1093/bioadv/vbac028. eCollection 2022.

Pollock: fishing for cell states

Affiliations

Pollock: fishing for cell states

Erik P Storrs et al. Bioinform Adv. .

Abstract

Motivation: The use of single-cell methods is expanding at an ever-increasing rate. While there are established algorithms that address cell classification, they are limited in terms of cross platform compatibility, reliance on the availability of a reference dataset and classification interpretability. Here, we introduce Pollock, a suite of algorithms for cell type identification that is compatible with popular single-cell methods and analysis platforms, provides a set of pretrained human cancer reference models, and reports interpretability scores that identify the genes that drive cell type classifications.

Results: Pollock performs comparably to existing classification methods, while offering easily deployable pretrained classification models across a wide variety of tissue and data types. Additionally, it demonstrates utility in immune pan-cancer analysis.

Availability and implementation: Source code and documentation are available at https://github.com/ding-lab/pollock. Pretrained models and datasets are available for download at https://zenodo.org/record/5895221.

Supplementary information: Supplementary data are available at Bioinformatics Advances online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Pollock overview schema. Overview of Pollock model architecture, training, cell type prediction and pretrained models usage. During training, single-cell inputs are split into training and validation sets. (1a and b) A VAE with a classification head is fit with the training partition of the single-cell data. The model is trained with contributions from three loss functions: KL divergence loss on the latent embedding, ZINB gene expression reconstruction loss and cross-entropy loss on the cell type predictions. (2) Evaluation metrics are then computed on a validation set of withheld single-cell data. In addition to cell type prediction, Pollock also outputs feature importance’s for the input features of each predicted cell. (3) Following the training, Pollock models are saved and can be used for cell type inference at a later date
Fig. 2.
Fig. 2.
Pollock feature comparison and benchmarking dataset overview. (A) Comparison of Pollock features against features implemented in other popular single-cell classification tools. (B) Datasets used for benchmarking and the training of disease-specific models
Fig. 3.
Fig. 3.
Pollock benchmarking and performance. (A) Pollock cell type classification performance (F1-score) compared against six established single-cell classification methods for each disease and data type. (B) Comparison of Pollock cell type classification performance between disease-specific and generalized models. Confusion matrices showing the overlap of generalized model predicted cell types versus groundtruth cell labels for (CE) scRNA-seq, snRNA-seq and snATAC-seq validation datasets and (F) a publicly available HCA bone marrow dataset
Fig. 4.
Fig. 4.
Pollock cell state annotation in a pan-immune atlas. (A) Confusion matrix showing overlap of Pollock predicted versus groundtruth cell labels for a scRNA-seq BRCA immune cell state annotated dataset. (B) Comparison of Pollock feature importance score and gene expression for literature-based single-cell marker genes. (C) Significant GO: Molecular Function pathways enriched in the top 20 DWGs for the following NK/T cell states: NK, CD8 T cell-proliferating, CD8 T cell-exhausted and Treg. Pathways are rank-ordered by their −log10 FDR corrected P-values. (D) Heatmap displaying feature importance scores for the top 20 DWGs for each immune cell state

References

    1. Abdelaal T. et al. (2019) A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol., 20, 194. - PMC - PubMed
    1. Afgan E. et al. (2018) The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res., 46, W537–W544. - PMC - PubMed
    1. Aliee H., Theis F.J. (2021) AutoGeneS: automatic gene selection using multi-objective optimization for RNA-seq deconvolution. Cell Syst., 12, 706–715.e4. - PubMed
    1. Aran D. et al. (2019) Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunol., 20, 163–172. - PMC - PubMed
    1. Chen J. et al. (2009) ToppGene suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res., 37, W305–W311. - PMC - PubMed