Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Apr 28:2025.04.22.650107.
doi: 10.1101/2025.04.22.650107.

SCassist: An AI Based Workflow Assistant for Single-Cell Analysis

Affiliations

SCassist: An AI Based Workflow Assistant for Single-Cell Analysis

Vijayaraj Nagarajan et al. bioRxiv. .

Update in

Abstract

Single-cell RNA sequencing (scRNA-seq) data analysis often involves complex iterative workflow, requiring significant expertise and time. To navigate this complexity, we have developed SCassist, an R package that leverages the power of the large language models (LLM's) to guide and enhance scRNA-seq analysis. SCassist integrates LLM's into key workflow steps, to analyze user data and provide relevant recommendations for filtering, normalization and clustering parameters. It also provides LLM guided insightful interpretations of variable features and principal components, along with cell type annotations and enrichment analysis. SCassist provides intelligent assistance using popular LLM's like Google's Gemini, OpenAI's GPT and Meta's Llama3, making scRNA-seq analysis accessible to researchers at all levels.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The general architecture of the SCassist algorithm. SCassist, an LLM-powered assistant, streamlines single-cell analysis within the standard Seurat workflow. The top portion of the figure depicts the typical Seurat steps (quality control, normalization, dimensionality reduction, clustering, and annotation), while the interconnected pink boxes represent SCassist components, providing data-driven insights and parameter recommendations for each step. SCassist could be used at any stage of the standard single-cell workflow, starting from the quality control stage, where the user input for SCassist is simply the Seurat object containing the raw count matrix data. For the given Seurat object, SCassist generates metrics like summary statistics, quantile data, variance explained, and others. These metrics are then used to build augmented prompts for large language models (LLMs), recommending optimal parameters for filtering, normalization, dimensionality reduction, identifying significant features and offering insights (from variable genes, principal components, differentially expressed genes), and annotating clusters along with detailed reasoning.

References

    1. Chen J., et al. (2023), ‘Transformer for one stop interpretable cell type annotation’, Nat Commun, 14 (1), 223. - PMC - PubMed
    1. Cui H., et al. (2024), ‘scGPT: toward building a foundation model for single-cell multi-omics using generative AI’, Nat Methods, 21 (8), 1470–80. - PubMed
    1. Devlin Jacob, et al. (2018), ‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’, arxiv.org.
    1. Fang Yin and Liu Kangwei and Zhang Ningyu and Deng Xinle and Yang Penghui and Chen Zhuo and Tang Xiangru and Gerstein Mark and Fan Xiaohui and Chen Huajun ‘ChatCell: Facilitating Single-Cell Analysis with Natural Language’.
    1. Hao M., et al. (2024), ‘Large-scale foundation model on single-cell transcriptomics’, Nat Methods, 21 (8), 1481–91. - PubMed

Publication types

LinkOut - more resources