This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Dec 19:2023.12.18.572214.

doi: 10.1101/2023.12.18.572214.

Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference

Xiaoru Dong¹, Jack R Leary¹, Chuanhao Yang¹, Maigan A Brusko^{2

3}, Todd M Brusko^{2

3

4}, Rhonda Bacher^{1

2}

Affiliations

¹ Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA.
² Diabetes Institute, University of Florida, Gainesville, FL 32610, USA.
³ Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA.
⁴ Department of Pediatrics, College of Medicine, University of Florida, Gainesville, FL 32610, USA.

PMID: 38187768
PMCID: PMC10769271
DOI: 10.1101/2023.12.18.572214

Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference

Xiaoru Dong et al. bioRxiv. 2023.

[Preprint]. 2023 Dec 19:2023.12.18.572214.

doi: 10.1101/2023.12.18.572214.

Authors

Xiaoru Dong¹, Jack R Leary¹, Chuanhao Yang¹, Maigan A Brusko^{2

3}, Todd M Brusko^{2

3

4}, Rhonda Bacher^{1

2}

Affiliations

¹ Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA.
² Diabetes Institute, University of Florida, Gainesville, FL 32610, USA.
³ Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA.
⁴ Department of Pediatrics, College of Medicine, University of Florida, Gainesville, FL 32610, USA.

PMID: 38187768
PMCID: PMC10769271
DOI: 10.1101/2023.12.18.572214

Update in

Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference.
Dong X, Leary JR, Yang C, Brusko MA, Brusko TM, Bacher R. Dong X, et al. Brief Bioinform. 2024 Mar 27;25(3):bbae216. doi: 10.1093/bib/bbae216. Brief Bioinform. 2024. PMID: 38725155 Free PMC article.

Abstract

Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics, however researchers still encounter challenges in their analysis due to uncertainties in selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort navigates single-cell trajectory analysis through data-driven assessments, reducing uncertainty and much of the decision burden associated with trajectory inference. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.

Keywords: Pseudotime inference; RNA-seq; Trajectory inference; single cell.

PubMed Disclaimer

Figures

**Fig. 1:. Analysis choices significantly impact trajectory estimation in scRNA-seq data.**
For various choices of selected genes and dimension reduction methods, trajectory inference and pseudotime estimation was performed on a scRNA-seq dataset of hematopoietic stem cells (Kowalczyk et al., 2015). A. Dimension-reduced spaces and estimated trajectories with cells colored by cell type. B. Pseudotime distributions for each set of analysis choices. C. Normalized gene expression as a function of pseudotime for *Cbx1*. Cells are colored by pseudotime, i.e. their location along the trajectory. Abbreviations: MDS = Multidimensional Scaling, UMAP = Uniform Manifold Approximation and Projection, HVG = Highly Variable Gene.

**Fig. 2.. Trajectory accuracy is impacted by different dimension reduction algorithms and inclusions of highly variable genes.**
A. The performance of different embeddings across all eight simulated scenarios is shown. Embeddings were ranked within each dataset separately for the three metrics. The ranks were scaled so that a lower rank indicated better within-dataset performance. B. Similar to A using Monocle3.

**Fig. 3:. Overview of ESCORT.**
Schematic of the Escort workflow. A. The first step detects the presence of a trajectory signal in the dataset before proceeding to evaluations of embeddings. B. Various metrics are using to evaluate user-defined embeddings regardless of the ultimate trajectory inference method to be used. C. In the final step, the preferred trajectory inference method of the user is used to fit a preliminary trajectory to allow the evaluation of method-specific hyperparameters. D. Based on the overall score, embeddings are classified as either recommended or non-recommended.

**Fig. 4.. Trajectory assessment performance of Escort on simulated datasets.**
A. The accuracy of trajectories generated on nine different embedding options for each of the eight simulated datasets is shown for different metrics: Kendall rank correlation and mean squared error. Simulated scenarios differ in terms of true trajectory topology (denoted by color) and simulator methods. The y-axis displays the values for the accuracy metric. B. Each embedding’s Escort score (x-axis) versus the value for each accuracy metric (y-axis) are shown and colored according to their classification by Escort.

**Fig. 5.. Trajectory assessment performance of Escort on public datasets.**
A. The accuracy of trajectories generated on nine different embedding options is shown for five publicly available datasets assessed using different metrics: Kendall rank correlation and mean squared error. The colors distinguish each embedding classification by Escort, in addition to those embeddings that failed in the second step. The y-axis displays the values for accuracy metrics. The x-axis corresponds to recommendations generated by Escort. B. Similar to A with the x-axis showing the Escort score.

**Fig. 6.. Analysis of transdifferentiation of hypertrophic chondroblasts using an Escort guided trajectory.**
A. UMAP of the original paper’s embedding and the Monocle3 based trajectory. B. UMAP of the original paper’s embedding using Slingshot to fit a trajectory. C. Escort recommended embedding using Slingshot to fit a trajectory. D. Correlation of pseudotime between the two Lineage A trajectories. E. Distribution of knots across all significantly dynamic genes for Lineage A. F. Gene expression as a function of pseudotime for *Snorc* and *Id2*. **G -I**. Similar to **D-F**, but for Lineage B.

See this image and copyright information in PMC

References

1. Bacher R., Chu L.-F., Argus C., Bolin J.M., Knight P., Thomson J.A., et al. (2022) Enhancing biological signals and detection rates in single-cell RNA-seq experiments with cDNA library equalization. Nucleic Acids Research, 50, e12. - PMC - PubMed
1. Baron M., Veres A., Wolock S.L., Faust A.L., Gaujoux R., Vetere A., et al. (2016) A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. Cell Systems, 3, 346–360.e4. - PMC - PubMed
1. Büaner M., Miao Z., Wolf F.A., Teichmann S.A. and Theis F.J. (2019) A test metric for assessing single-cell RNA-seq batch correction. Nature Methods, 16, 43–49. - PubMed
1. Cannoodt R., Saelens W., Sichien D., Tavernier S., Janssens S., Guilliams M., et al. (2016) SCORPIUS Improves Trajectory Inference and Iden@fies Novel Modules in Dendri@c Cell Development. preprint, Bioinformatics.
1. Cannoodt R., Saelens W., Todorov H. and Saeys Y. (2018a) Single-cell -omics datasets containing a trajectory.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference

Affiliations

Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference

Authors

Affiliations

Update in

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources