Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 20:21:5382-5393.
doi: 10.1016/j.csbj.2023.10.032. eCollection 2023.

SCALA: A complete solution for multimodal analysis of single-cell Next Generation Sequencing data

Affiliations

SCALA: A complete solution for multimodal analysis of single-cell Next Generation Sequencing data

Christos Tzaferis et al. Comput Struct Biotechnol J. .

Abstract

Analysis and interpretation of high-throughput transcriptional and chromatin accessibility data at single-cell (sc) resolution are still open challenges in the biomedical field. The existence of countless bioinformatics tools, for the different analytical steps, increases the complexity of data interpretation and the difficulty to derive biological insights. In this article, we present SCALA, a bioinformatics tool for analysis and visualization of single-cell RNA sequencing (scRNA-seq) and Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) datasets, enabling either independent or integrative analysis of the two modalities. SCALA combines standard types of analysis by integrating multiple software packages varying from quality control to the identification of distinct cell populations and cell states. Additional analysis options enable functional enrichment, cellular trajectory inference, ligand-receptor analysis, and regulatory network reconstruction. SCALA is fully parameterizable, presenting data in tabular format and producing publication-ready visualizations. The different available analysis modules can aid biomedical researchers in exploring, analyzing, and visualizing their data without any prior experience in coding. We demonstrate the functionality of SCALA through two use-cases related to TNF-driven arthritic mice, handling both scRNA-seq and scATAC-seq datasets. SCALA is developed in R, Shiny and JavaScript and is mainly available as a standalone version, while an online service of more limited capacity can be found at http://scala.pavlopouloslab.info or https://scala.fleming.gr.

Keywords: Automated analysis of single-cell Next Generation Sequencing data; Integrative analysis of single-cell Next Generation Sequencing data; Single-cell ATAC-seq analysis; Single-cell RNA sequencing analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
General workflow of the SCALA pipeline. In this figure, the input files compatible with SCALA for scRNA-seq and scATAC-seq analysis are shown in the left panel. Additionally, the main functionalities and outputs, for each mode of analysis for RNA (blue box) and ATAC (red box) assays, are showcased in the right panel.
Fig. 2
Fig. 2
Analysis of PBMC3k scRNA-seq dataset. (A) Violin plots depicting cell quality control measurements including the number of genes detected, the total number of reads, and the percentage of reads mapped to the mitochondrial genome. (B) Visualization of cells in UMAP space. Cells are colored according to cluster labels (clusters were identified with the Louvain algorithm). (C) Feature plot showcasing signature scores/per cell for the top marker genes of cluster 0. Color scale denotes the intensity of signature score. Red color indicates high intensity values, while grey indicates low intensity values. (D) UMAP projection showcasing the results of cell cycle phase analysis. Cells are colored according to the phase of the cell cycle they are predicted to belong to. (E) Heatmap depicting the top10 marker genes per cluster, ranked by Log2FC value. Genes are shown in y-axis and cells are shown in x-axis. Color scale denotes scaled expression values, with blue color indicating low expression and red indicating high expression. (F) Heatmap showing “bona fide” interactions between clusters 0 (ligand expressing cluster) and 2 (receptor expressing cluster). The intensity of the color represents the interaction potential score (high intensity is represented with red, while low is represented with grey). (G) Heatmap of scaled AUC scores for the top regulons per cluster. Color scale denotes z-scores of AUC values (high values represented with yellow color, while low values are represented with purple).
Fig. 3
Fig. 3
Analysis of BMMCs scATAC-seq dataset. (A) Cell quality control plots depicting information about TSS enrichment and unique fragments measurements. (B) Bar plot showing the relative abundance of cells in each of the dataset’s clusters (clusters were identified with the Louvain algorithm). (C) Projection of cells in UMAP space. Cells are colored according to cluster identity. (D) Heatmap showing z-scores of peak accessibility for the top marker peaks per cluster. Clusters are shown in y-axis, while peaks are plotted in x-axis. (E) Feature plot showcasing gene activity scores (per cell) of CD14 as a UMAP overlay. Intensity of the color denotes imputed log2 normalized expression values. (F) Genome browser tracks showing local chromatin accessibility (y-axis left panel) of CD3D gene at cluster level (y-axis right panel). (G) Heatmap displaying motif deviations z-scores of positive regulators for all clusters. Regulators are shown in x-axis, while clusters are shown in y-axis.
Fig. 4
Fig. 4
Use case - hTNFtg scRNA-seq data analysis. (A) Graph based clustering of SFs identified 9 distinct clusters. Cells are visualized in UMAP space and are colored by cluster assignment. (B) The barplot depicts relative abundances of clusters in healthy (Wt) and disease (hTNFtg) states. The highlighted areas pinpoint the clusters that are expanded in arthritic state. (C) Feature plots showing the different gene expression patterns between the clusters of sublining (top row), intermediate (middle row), and lining (bottom row) categories. Cells are projected in the 2D UMAP space and colored by normalized gene expression. (D) One of the possible lineages (proposed by trajectory analysis) is showcased in UMAP overlay. Cells belonging to the lineage are colored according to their pseudo-time values, while cells that are not part of this lineage are colored in light gray. (E) Heatmap depicting regulon activity of top-80 regulons (z-scores of AUC values) at the cluster level. Hierarchical clustering of fibroblast subsets (using active regulons) identified two major groups (group1: sublining clusters, group2: intermediate and lining clusters).
Fig. 5
Fig. 5
Use case - hTNFtg scATAC-seq data analysis. (A) Integration between scRNA-seq and scATAC-seq datasets. Cluster labels from RNA analysis are transferred to ATAC. Cells are projected in UMAP space and colored according to clustering (left) or transferred labels (right). (B) Semi-supervised trajectory analysis in the ATAC dataset recapitulates the outcome of the respective analysis in RNA data. S2b was used as an initial state and S4a as a final state. (C) z-scores of gene activity values for the top-10 marker features of each cluster (after integration) are displayed in a heatmap.(D) Heatmap displaying z-scores for the accessibility of top-10 marker peaks for each cluster (after integration). (E) Motif enrichment analysis in marker peaks of each cluster. Enriched motifs of each cluster are displayed in a heatmap. Color scale denotes the significance of enrichment. (F) Gene regulation analysis identifies positive regulators for each cluster. Top regulators are displayed in a heatmap. Color scale depicts motif deviations z-scores. In panels (C-F) marker genes/peaks, enriched motifs and positive regulators are shown in x-axis, while clusters (after integration) are shown in y-axis.

References

    1. Slovin S., Carissimo A., Panariello F., Grimaldi A., Bouché V., Gambardella G., et al. Single-Cell RNA sequencing analysis: a step-by-step overview. Methods Mol Biol. 2021;2284:343–365. 〈https://pubmed.ncbi.nlm.nih.gov/33835452/〉 [cited 2023 Apr 3] - PubMed
    1. Li L., Xiong F., Wang Y., Zhang S., Gong Z., Li X., et al. What are the applications of single-cell RNA sequencing in cancer research: a systematic review. J Exp Clin Cancer Res. 2021;40(1) 〈https://pubmed.ncbi.nlm.nih.gov/33975628/〉 [cited 2023 Apr 3] - PMC - PubMed
    1. Andrews T.S., Kiselev V.Y., McCarthy D., Hemberg M. Tutorial: guidelines for the computational analysis of single-cell RNA sequencing data. Nat Protoc. 2021;16(1) 〈https://pubmed.ncbi.nlm.nih.gov/33288955/〉 cited 2023 Apr 3] - PubMed
    1. Huang W., Wang D., Yao Y.F. Understanding the pathogenesis of infectious diseases by single-cell RNA sequencing. Micro Cell. 2021;8(9):208–222. 〈https://pubmed.ncbi.nlm.nih.gov/34527720/〉 [cited 2023 Apr 3] - PMC - PubMed
    1. Luecken M.D., Theis F.J. Current best practices in single‐cell RNA‐seq analysis: a tutorial. Mol Syst Biol. 2019;15(6) [cited 2021 Sep 17]; - PMC - PubMed