Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 3;4(8):100814.
doi: 10.1016/j.patter.2023.100814. eCollection 2023 Aug 11.

Interactive analysis of single-cell data using flexible workflows with SCTK2

Affiliations

Interactive analysis of single-cell data using flexible workflows with SCTK2

Yichen Wang et al. Patterns (N Y). .

Abstract

Analysis of single-cell RNA sequencing (scRNA-seq) data can reveal novel insights into the heterogeneity of complex biological systems. Many tools and workflows have been developed to perform different types of analyses. However, these tools are spread across different packages or programming environments, rely on different underlying data structures, and can only be utilized by people with knowledge of programming languages. In the Single-Cell Toolkit 2 (SCTK2), we have integrated a variety of popular tools and workflows to perform various aspects of scRNA-seq analysis. All tools and workflows can be run in the R console or using an intuitive graphical user interface built with R/Shiny. HTML reports generated with Rmarkdown can be used to document and recapitulate individual steps or entire analysis workflows. We show that the toolkit offers more features when compared with existing tools and allows for a seamless analysis of scRNA-seq data for non-computational users.

Keywords: analysis; bioinformatics; genomic; graphical user interface; interactive; interoperability; single cell; software; toolkit; transcriptomic.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Overview of analysis workflows available in SCTK2. Analysis of scRNA-seq data can be divided into three major parts: importing and quality control (QC), clustering workflows, and downstream analysis. For importing and QC (top), SCTK2 can import data from many different upstream preprocessing tools and formats. A variety of metrics for general QC, empty drop detection, doublet detection, and ambient RNA quantification can be calculated and displayed for each sample. For clustering workflows (middle), SCTK2 provides an à la carte workflow that allows users to pick and choose different tools at each step for normalization, batch correction, or integration, dimensionality reduction, and clustering. For downstream analysis (bottom), SCTK2 provides access to additional tools and analyses for differential expression, cell-type labeling, pathway analysis, and trajectory analysis. Overall, the toolkit provides a wide variety of methods for each part of the analysis workflow.
Figure 2
Figure 2
Overview of curated analysis workflows In addition to the à la carte clustering workflow, SCTK2 provides access to workflows from the R packages Seurat and celda as well as the Python package scanpy. Users can recapitulate the analysis, results, and plots from each package all while using the common and unified interface in SCTK2 without having to know the underlying commands from each package. Functions for normalization, variable feature selection, dimensionality reduction, and clustering are available from the Seurat and scanpy workflows. Celda can be used to group cells into clusters and genes into modules.
Figure 3
Figure 3
Interactive analysis of scRNA-seq data with a graphical user interface (GUI) SCTK2 allows non-computational users to analyze scRNA-seq data using an interactive GUI built with R/Shiny that can be hosted on a web server. (A) (1) The menu bar allows the users to navigate through the main sections including data importing, QC, the à la carte clustering workflow, and downstream analysis. (2) Within each major section, parameters to run tools can be selected in the left panel. (3) Results and plots will be displayed in the right panel. (4) Many plots can be customized with additional options such as changing the color of points to reflect different phenotypes. (5) A “next steps” panel provides a “wizard”-like guide by suggesting links to the recommended next steps. (B) The curated workflows for Seurat, celda, and scanpy can be used to run a series of predefined steps using vertical tabs. (1) Curated workflows can be selected from the top navigation menu bar. The Seurat curated workflow is shown as an example. (2) Steps for normalization, feature selection, dimensionality reduction, clustering, 2D embedding, and finding markers can be selected and run using the vertical tabs. (3) Within each major section, parameters to run tools can be selected in the left panel, and (4) results and plots will be displayed in the right panel. (5) Within the Seurat curated workflow, an extra section is given for exploring expression of features using UMAPs, heatmaps, and violin plots.
Figure 4
Figure 4
Facilitating reproducibility and sharing of results with HTML reports SCTK2 provides the ability to generate HTML reports for several individual analyses or entire workflows to enable reproducibility and facilitate sharing of results. An HTML report for clustering of PMBC data with Seurat is shown as an example. (1) Different steps that were run in the workflow can be selected with the content menu on the left of the report. (2) In each section, a description of the step or tool and the selected parameters are shown at the top, and (3) the code used to produce the plot can be expanded. (4) The results and plots are shown on the right side. The “clustering” section shows different choices of the “resolution” parameter in different tabs to allow users to easily explore different sets of cluster labels.
Figure 5
Figure 5
Benchmarking of RAM and CPU usage for datasets of different sizes RAM allocation and elapsed time was benchmarked for four datasets (pbmc6k, pbmc68k, immune100k, and immune300k) using a Bioconductor-based analysis workflow. (A) The RAM usage for the output SCE object after each step is shown for each dataset. (B) The peak RAM usage during each step is displayed for each dataset. (C) The time elapsed during each step is displayed for each dataset. The left part zooms in on the y axis of the right part.

References

    1. Haque A., Engel J., Teichmann S.A., Lönnberg T. A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications. Genome Med. 2017;9:75. doi: 10.1186/s13073-017-0467-4. - DOI - PMC - PubMed
    1. Hwang B., Lee J.H., Bang D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med. 2018;50:1–14. doi: 10.1038/s12276-018-0071-8. - DOI - PMC - PubMed
    1. Chen G., Ning B., Shi T. Single-Cell RNA-Seq Technologies and Related Computational Data Analysis. Front. Genet. 2019;10:317. - PMC - PubMed
    1. Eisenstein M. Single-cell RNA-seq analysis software providers scramble to offer solutions. Nat. Biotechnol. 2020;38:254–257. doi: 10.1038/s41587-020-0449-8. - DOI - PubMed
    1. Li B., Gould J., Yang Y., Sarkizova S., Tabaka M., Ashenberg O., Rosen Y., Slyper M., Kowalczyk M.S., Villani A.-C., et al. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat. Methods. 2020;17:793–798. doi: 10.1038/s41592-020-0905-x. - DOI - PMC - PubMed

LinkOut - more resources