Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Dec 19:2023.12.18.572214.
doi: 10.1101/2023.12.18.572214.

Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference

Affiliations

Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference

Xiaoru Dong et al. bioRxiv. .

Update in

Abstract

Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics, however researchers still encounter challenges in their analysis due to uncertainties in selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort navigates single-cell trajectory analysis through data-driven assessments, reducing uncertainty and much of the decision burden associated with trajectory inference. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.

Keywords: Pseudotime inference; RNA-seq; Trajectory inference; single cell.

PubMed Disclaimer

Figures

Fig. 1:
Fig. 1:. Analysis choices significantly impact trajectory estimation in scRNA-seq data.
For various choices of selected genes and dimension reduction methods, trajectory inference and pseudotime estimation was performed on a scRNA-seq dataset of hematopoietic stem cells (Kowalczyk et al., 2015). A. Dimension-reduced spaces and estimated trajectories with cells colored by cell type. B. Pseudotime distributions for each set of analysis choices. C. Normalized gene expression as a function of pseudotime for Cbx1. Cells are colored by pseudotime, i.e. their location along the trajectory. Abbreviations: MDS = Multidimensional Scaling, UMAP = Uniform Manifold Approximation and Projection, HVG = Highly Variable Gene.
Fig. 2.
Fig. 2.. Trajectory accuracy is impacted by different dimension reduction algorithms and inclusions of highly variable genes.
A. The performance of different embeddings across all eight simulated scenarios is shown. Embeddings were ranked within each dataset separately for the three metrics. The ranks were scaled so that a lower rank indicated better within-dataset performance. B. Similar to A using Monocle3.
Fig. 3:
Fig. 3:. Overview of ESCORT.
Schematic of the Escort workflow. A. The first step detects the presence of a trajectory signal in the dataset before proceeding to evaluations of embeddings. B. Various metrics are using to evaluate user-defined embeddings regardless of the ultimate trajectory inference method to be used. C. In the final step, the preferred trajectory inference method of the user is used to fit a preliminary trajectory to allow the evaluation of method-specific hyperparameters. D. Based on the overall score, embeddings are classified as either recommended or non-recommended.
Fig. 4.
Fig. 4.. Trajectory assessment performance of Escort on simulated datasets.
A. The accuracy of trajectories generated on nine different embedding options for each of the eight simulated datasets is shown for different metrics: Kendall rank correlation and mean squared error. Simulated scenarios differ in terms of true trajectory topology (denoted by color) and simulator methods. The y-axis displays the values for the accuracy metric. B. Each embedding’s Escort score (x-axis) versus the value for each accuracy metric (y-axis) are shown and colored according to their classification by Escort.
Fig. 5.
Fig. 5.. Trajectory assessment performance of Escort on public datasets.
A. The accuracy of trajectories generated on nine different embedding options is shown for five publicly available datasets assessed using different metrics: Kendall rank correlation and mean squared error. The colors distinguish each embedding classification by Escort, in addition to those embeddings that failed in the second step. The y-axis displays the values for accuracy metrics. The x-axis corresponds to recommendations generated by Escort. B. Similar to A with the x-axis showing the Escort score.
Fig. 6.
Fig. 6.. Analysis of transdifferentiation of hypertrophic chondroblasts using an Escort guided trajectory.
A. UMAP of the original paper’s embedding and the Monocle3 based trajectory. B. UMAP of the original paper’s embedding using Slingshot to fit a trajectory. C. Escort recommended embedding using Slingshot to fit a trajectory. D. Correlation of pseudotime between the two Lineage A trajectories. E. Distribution of knots across all significantly dynamic genes for Lineage A. F. Gene expression as a function of pseudotime for Snorc and Id2. G -I. Similar to D-F, but for Lineage B.

Similar articles

References

    1. Bacher R., Chu L.-F., Argus C., Bolin J.M., Knight P., Thomson J.A., et al. (2022) Enhancing biological signals and detection rates in single-cell RNA-seq experiments with cDNA library equalization. Nucleic Acids Research, 50, e12. - PMC - PubMed
    1. Baron M., Veres A., Wolock S.L., Faust A.L., Gaujoux R., Vetere A., et al. (2016) A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. Cell Systems, 3, 346–360.e4. - PMC - PubMed
    1. Büaner M., Miao Z., Wolf F.A., Teichmann S.A. and Theis F.J. (2019) A test metric for assessing single-cell RNA-seq batch correction. Nature Methods, 16, 43–49. - PubMed
    1. Cannoodt R., Saelens W., Sichien D., Tavernier S., Janssens S., Guilliams M., et al. (2016) SCORPIUS Improves Trajectory Inference and Iden@fies Novel Modules in Dendri@c Cell Development. preprint, Bioinformatics.
    1. Cannoodt R., Saelens W., Todorov H. and Saeys Y. (2018a) Single-cell -omics datasets containing a trajectory.

Publication types