The Transcriptome of SH-SY5Y at Single-Cell Resolution: A CITE-Seq Data Analysis Workflow

Daniele Mercatelli¹, Nicola Balboni¹, Francesca De Giorgio^{2

3}, Emanuela Aleo⁴, Caterina Garone^{2

3}, Federico Manuel Giorgi¹

Affiliations

¹ Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy.
² Department of Medical and Surgical Sciences, University of Bologna, 40138 Bologna, Italy.
³ Center for Applied Biomedical Research (CRBA), University of Bologna, 40138 Bologna, Italy.
⁴ IGA Technology Services, 33100 Udine, Italy.

PMID: 34066513
PMCID: PMC8163004
DOI: 10.3390/mps4020028

The Transcriptome of SH-SY5Y at Single-Cell Resolution: A CITE-Seq Data Analysis Workflow

Daniele Mercatelli et al. Methods Protoc. 2021.

. 2021 May 6;4(2):28.

doi: 10.3390/mps4020028.

Authors

Daniele Mercatelli¹, Nicola Balboni¹, Francesca De Giorgio^{2

3}, Emanuela Aleo⁴, Caterina Garone^{2

3}, Federico Manuel Giorgi¹

Affiliations

¹ Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy.
² Department of Medical and Surgical Sciences, University of Bologna, 40138 Bologna, Italy.
³ Center for Applied Biomedical Research (CRBA), University of Bologna, 40138 Bologna, Italy.
⁴ IGA Technology Services, 33100 Udine, Italy.

PMID: 34066513
PMCID: PMC8163004
DOI: 10.3390/mps4020028

Abstract

Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) is a recently established multimodal single cell analysis technique combining the immunophenotyping capabilities of antibody labeling and cell sorting with the resolution of single-cell RNA sequencing (scRNA-seq). By simply adding a 12-bp nucleotide barcode to antibodies (cell hashing), CITE-seq can be used to sequence antibody-bound tags alongside the cellular mRNA, thus reducing costs of scRNA-seq by performing it at the same time on multiple barcoded samples in a single run. Here, we illustrate an ideal CITE-seq data analysis workflow by characterizing the transcriptome of SH-SY5Y neuroblastoma cell line, a widely used model to study neuronal function and differentiation. We obtained transcriptomes from a total of 2879 single cells, measuring an average of 1600 genes/cell. Along with standard scRNA-seq data handling procedures, such as quality checks and cell filtering procedures, we performed exploratory analyses to identify most stable genes to be possibly used as reference housekeeping genes in qPCR experiments. We also illustrate how to use some popular R packages to investigate cell heterogeneity in scRNA-seq data, namely Seurat, Monocle, and slalom. Both the CITE-seq dataset and the code used to analyze it are freely shared and fully reusable for future research.

Keywords: CITE-seq; gene regulatory networks; neuroblastoma; single-cell; transcriptomics; unsupervised learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Exploratory analysis of single-cell mRNA expression on SY5Y cell line. (A) SY5Y cells were left growing in flask until reaching about 90% of confluence before splitting to two flasks for library preparation and sequencing. (B) Scatterplot showing Log10 counts of HTOs. Three populations are distinguished on the basis of Hashtag counts: cells belonging to SplitA, cells belonging to SplitB, and multiplets. Proportions of UMI counts, measured genes, mtUMI and mitoRatio are reported in the table, and QC plots summarizing these metrics are shown in (C) barplot, showing the number of cells assigned to each sampling group, (D) density plot showing a similar number of UMI/cell in both SplitA and B, while multiplets contained higher UMI/cell, and (E) Box and Whisker plots showing the number of genes detected per cell in the three groups.

**Figure 2**
Exploratory analysis of top varying and most stable genes across the dataset. (A) Seurat standardized variance vs. average expression plot. The top 10 most varying genes are indicated. Most average expressed genes (x-axis) with lowest average variance (y-axis) are indicated in (B) SplitA, (C) SplitB, and (D) in the entire dataset. Genes showing an average variance <0.2 and >4 LogNorm Average Expression are labeled, together with commonly used HK genes, such as ACTB, B2M, and GAPDH.

**Figure 3**
Candidate SY5Y HK genes. Bar plots showing 13 genes selected in Figure 2D compared for LogNorm average expression (A) or LogNorm Average Expression/Variance Ratio (B). Using ratios seems to be a better choice to identify most stable genes suitable as HK. In (C), all genes showing a ratio > 15 are indicated. Forty-nine genes are suitable HK candidates.

**Figure 4**
Principal Component Analysis. (A) Single cells are plotted along the first two components retaining the highest variance. No difference among the two splits, or multiplets, was detectable. (B) Principal component analysis on cell cycle genes showed a clear separation of cells according to cell cycle genes expression. (C) Partial regression removed the cell cycle variance, maintaining the difference between G1 and G2M/S cells. Partial regression also allowed retaining signals separating non-cycling and cycling cells, while removing differences in cell cycle phases amongst proliferating cells (which are often uninteresting).

**Figure 5**
Cluster Analysis. (A) UMAP 2D projection of single cells. No detectable difference was observed between splits. (B) Cells can be assigned to four cluster-forming communities applying the Leiden algorithm. (C) Unsupervised trajectory learning. Most variant genes show expression differences following a path from cluster 3 to 2. Cluster 4 remains separated from the others. (D) Top 12 genes characterizing each community. Cluster 4 show the highest expression of TUBA1A and HSP90AA1, while S100A6 was poorly expressed in a minimal fraction of cells (<0.1). Higher PTMA, HMGB2, and H2AZ1 expression characterizes cluster 1.

**Figure 6**
Slalom output. (A) Graph showing the most relevant factors identified by the f-scLVM model, both annotated in Wiki Pathways (blue) or not annotated (red) (B) Most relevant genes in the WP COPPER HOMEOSTASIS component (C) Most relevant genes in the WP CELL CYCLE component, (D) Most relevant genes in the hidden05 component.

See this image and copyright information in PMC

References

1. Tang F., Barbacioru C., Wang Y., Nordman E., Lee C., Xu N., Wang X., Bodeau J., Tuch B.B., Siddiqui A., et al. MRNA-Seq Whole-Transcriptome Analysis of a Single Cell. Nat. Methods. 2009;6:377–382. doi: 10.1038/nmeth.1315. - DOI - PubMed
1. Angerer P., Simon L., Tritschler S., Wolf F.A., Fischer D., Theis F.J. Single Cells Make Big Data: New Challenges and Opportunities in Transcriptomics. Curr. Opin. Syst. Biol. 2017;4:85–91. doi: 10.1016/j.coisb.2017.07.004. - DOI
1. Olsen T.K., Baryawno N. Introduction to Single-Cell RNA Sequencing. Curr. Protoc. Mol. Biol. 2018;122:e57. doi: 10.1002/cpmb.57. - DOI - PubMed
1. Baryawno N., Przybylski D., Kowalczyk M.S., Kfoury Y., Severe N., Gustafsson K., Kokkaliaris K.D., Mercier F., Tabaka M., Hofree M., et al. A Cellular Taxonomy of the Bone Marrow Stroma in Homeostasis and Leukemia. Cell. 2019;177:1915–1932.e16. doi: 10.1016/j.cell.2019.04.040. - DOI - PMC - PubMed
1. Soldatov R., Kaucka M., Kastriti M.E., Petersen J., Chontorotzea T., Englmaier L., Akkuratova N., Yang Y., Häring M., Dyachuk V., et al. Spatiotemporal Structure of Cell Fate Decisions in Murine Neural Crest. Science. 2019;364 doi: 10.1126/science.aas9536. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Transcriptome of SH-SY5Y at Single-Cell Resolution: A CITE-Seq Data Analysis Workflow

Affiliations

The Transcriptome of SH-SY5Y at Single-Cell Resolution: A CITE-Seq Data Analysis Workflow

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Molecular Biology Databases