Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 12:5:1519468.
doi: 10.3389/fbinf.2025.1519468. eCollection 2025.

Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation

Affiliations

Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation

Mikhail Arbatsky et al. Front Bioinform. .

Abstract

Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.

Keywords: ScRNA-seq; biocentric mathematics; cell clustering; datasets integration; dimension reduction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Processing the scRNA-seq data and detecting the low-quality cells during adipogenic differentiation of mesenchymal stem cells (MSCs). (A) t-SNE Projection of Cells Colored by UMI counts (web_summary file, CellRanger). (B) t-SNE Projection of Cells Colored by Automated Clustering (web_summary file, CellRanger). (C) Cell counts by clusters. (D) Removal of cells with excessively high or low counts. (E) Clusters removed based on UMI counts. (F) Removal of cells with excessively high or low features. (G) Clusters removed based on features. (H) Violin plot (VlnPlot) before cell filtering. (I) Violin plot (VlnPlot) of cells after cell filtering using parameters obtained from the Loupe Browser.
FIGURE 2
FIGURE 2
2D- UMAP (A) and 3D- UMAP (B) of MSCs induced towards adipogenic differentiation.
FIGURE 3
FIGURE 3
Resolution-dependent variations in the number of clusters identified in mesenchymal stem cells (MSCs) undergoing adipogenic differentiation. (A) Resolution 0.1, (B) Resolution 0.2, (C) Resolution 0.3, 0.4, (D) Resolution 0.5, (E) Resolution 0.8, (F) Resolution 0.9.
FIGURE 4
FIGURE 4
Cell Clustering and Integration Methods Using the Seurat Package. Cell Samples: Upper left–MSCs induced for adipogenic differentiation; lower left–control MSCs sample. Integrator (clusters): Examples of data integration and clustering methods include CCA (Canonical Correlation Analysis), RPCA (Reciprocal PCA), and SCTransform (Regularized Negative Binomial Regression applied to normalize UMI count data) (*SCTransform is not an integration method, it is used for data normalization as a substitute for Normalize Data, Find Variable Features, and Scale Data functions. Integrator (samples) represents the integrated object of overlapping cell samples (control cells and MSCs induced for adipogenic differentiation).

References

    1. Ahlmann-Eltze C., Huber W. (2023). Comparison of transformations for single-cell RNA-seq data. Nat. Methods 20, 665–672. 10.1038/s41592-023-01814-1 - DOI - PMC - PubMed
    1. Boileau P., Hejazi N. S., Dudoit S. (2020). Exploring high-dimensional biological data with sparse contrastive principal component analysis. Bioinformatics 36, 3422–3430. 10.1093/bioinformatics/btaa176 - DOI - PubMed
    1. Craig J. C., Eberwine J. H., Calvin J. A., Wlodarczyk B., Bennett G. D., Finnell R. H. (1997). Developmental expression of morphoregulatory genes in the mouse embryo: an analytical approach using a novel technology. Biochem. Mol. Med. 60, 81–91. 10.1006/bmme.1997.2576 - DOI - PubMed
    1. Duò A., Robinson M. D., Soneson C. (2018). A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 7, 1141. 10.12688/f1000research.15666.3 - DOI - PMC - PubMed
    1. Eckmann J.-P., Tlusty T. (2021). Dimensional reduction in complex living systems: where, why, and how. Bioessays 43, e2100062. 10.1002/bies.202100062 - DOI - PubMed

LinkOut - more resources