. 2020 Mar 5;11(1):1201.

doi: 10.1038/s41467-020-14766-3.

Trajectory-based differential expression analysis for single-cell sequencing data

Koen Van den Berge^{1

2

3}, Hector Roux de Bézieux^{4

5}, Kelly Street^{6

7}, Wouter Saelens^{1

8}, Robrecht Cannoodt^{8

9

10}, Yvan Saeys^{1

8}, Sandrine Dudoit^#^{11

12

13}, Lieven Clement^#^{14

15}

Affiliations

¹ Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
² Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.
³ Department of Statistics, University of California, Berkeley, CA, USA.
⁴ Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA.
⁵ Center for Computational Biology, University of California, Berkeley, CA, USA.
⁶ Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
⁷ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
⁸ Data mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.
⁹ Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium.
¹⁰ Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
¹¹ Department of Statistics, University of California, Berkeley, CA, USA. sandrine@stat.berkeley.edu.
¹² Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA. sandrine@stat.berkeley.edu.
¹³ Center for Computational Biology, University of California, Berkeley, CA, USA. sandrine@stat.berkeley.edu.
¹⁴ Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium. lieven.clement@ugent.be.
¹⁵ Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium. lieven.clement@ugent.be.

^# Contributed equally.

PMID: 32139671
PMCID: PMC7058077
DOI: 10.1038/s41467-020-14766-3

Trajectory-based differential expression analysis for single-cell sequencing data

Koen Van den Berge et al. Nat Commun. 2020.

. 2020 Mar 5;11(1):1201.

doi: 10.1038/s41467-020-14766-3.

Authors

Affiliations

¹ Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.
² Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium.
³ Department of Statistics, University of California, Berkeley, CA, USA.
⁴ Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA.
⁵ Center for Computational Biology, University of California, Berkeley, CA, USA.
⁶ Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
⁷ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
⁸ Data mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium.
⁹ Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium.
¹⁰ Department of Biomolecular Medicine, Ghent University, Ghent, Belgium.
¹¹ Department of Statistics, University of California, Berkeley, CA, USA. sandrine@stat.berkeley.edu.
¹² Division of Biostatistics, School of Public Health, University of California, Berkeley, CA, USA. sandrine@stat.berkeley.edu.
¹³ Center for Computational Biology, University of California, Berkeley, CA, USA. sandrine@stat.berkeley.edu.
¹⁴ Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium. lieven.clement@ugent.be.
¹⁵ Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium. lieven.clement@ugent.be.

^# Contributed equally.

PMID: 32139671
PMCID: PMC7058077
DOI: 10.1038/s41467-020-14766-3

Abstract

Trajectory inference has radically enhanced single-cell RNA-seq research by enabling the study of dynamic changes in gene expression. Downstream of trajectory inference, it is vital to discover genes that are (i) associated with the lineages in the trajectory, or (ii) differentially expressed between lineages, to illuminate the underlying biological processes. Current data analysis procedures, however, either fail to exploit the continuous resolution provided by trajectory inference, or fail to pinpoint the exact types of differential expression. We introduce tradeSeq, a powerful generalized additive model framework based on the negative binomial distribution that allows flexible inference of both within-lineage and between-lineage differential expression. By incorporating observation-level weights, the model additionally allows to account for zero inflation. We evaluate the method on simulated datasets and on real datasets from droplet-based and full-length protocols, and show that it yields biological insights through a clear interpretation of the data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Overview of tradeSeq functionality.**
a A scatterplot of expression measures vs. pseudotimes for a single gene, where each lineage is represented by a different color (top left). b A negative binomial generalized additive model (NB-GAM) is fitted using the fitGAM function. The locations of the knots for the splines are displayed with gray dashed vertical lines. c The NB-GAM can then be used to perform a variety of tests of differential expression within or between lineages. In the table, we assume that the earlyDETest is used to assess differences in expression patterns early on in the lineage, e.g., with option knots = c(1, 2), meaning that we test for differential patterns between the first and second dashed gray lines from panel (b). d Interesting genes can finally be clustered to display the different patterns of expression.

**Fig. 2. Tests currently implemented in the tradeSeq package.**
Each column corresponds to a test. Tests are broken down into two categories, depending on whether they concern a within-lineage comparison, i.e., properties of the orange curve, or a between-lineage comparison, i.e., contrasting the blue and orange curves. For each test, we have two toy examples of gene expression patterns. The top one corresponds to a differentially expressed gene according to the test, while the bottom one does not.

**Fig. 3. Simulation study results.**
PCA plots for the (a) cyclic, (b) bifurcating, and (c) multifurcating simulated trajectories. The plotting symbol for each cell is colored according to its true pseudotime; trajectories (in black) were inferred by princurve in (a) and slingshot in (b) and (c). d–f Scatterplot of the true positive rate (TPR) vs. the false discovery rate (FDR) or false discovery proportion (FDP) for various DE methods applied to the simulated data sets. Panel (d) displays the average performance curves of DE methods across seven out of ten cyclic data sets for which all DE methods worked (Monocle 3 errored on three data sets). The associationTest from tradeSeq has superior performance for discovering genes whose expression is associated with pseudotime, as compared with Monocle 3. When investigating differential expression between lineages of a trajectory, the patternTest of tradeSeq consistently outperforms the diffEndTest across all three TI methods, since it is capable of comparing expression across entire lineages. Panel (e) displays the average performance curves across the three bifurcating data sets where all TI methods recovered the correct topology. Here, all tradeSeq patternTest workflows, tradeSeq_slingshot_end, and edgeR have similar performance and all are superior to BEAM, ImpulseDE2, and GPfates. Note that the performance of tradeSeq_Monocle2_end deteriorates as compared with tradeSeq_slingshot_end; the curve for tradeSeq_GPfates_end is not visible in this panel due to its low performance. For the multifurcating data set of panel (f), tradeSeq_slingshot has the highest performance, closely followed by tradeSeq_Monocle2 and edgeR.

**Fig. 4. Benchmark of computation time and memory usage.**
Data sets with 100, 1000, and 10,000 cells are simulated and each method is evaluated, respectively, 10, 2, and 2 times on each data set to assess computation time and memory usage. The average across iterations is plotted for each method. Methods that went over a 4-h mark were stopped and deemed taking too long.

**Fig. 5. Mouse bone marrow case study.**
a Two-dimensional representation of a subset of the data using independent components analysis (ICA). The myeloid trajectory inferred by slingshot is displayed. b Two-dimensional representation of a subset of the data using UMAP. The myeloid trajectory inferred by slingshot is displayed. The UMAP dimensionality reduction method better captures the smooth differentiation process than ICA. c Estimated smoothers for the top six genes identified by the tradeSeq patternTest procedure on the trajectory from (b). d Six clusters for the top 500 genes with different expression patterns between the two lineages (as identified by patternTest from tradeSeq).

**Fig. 6. Mouse olfactory epithelium case study.**
a Three-dimensional PCA plot of the scRNA-seq data, where cells are colored according to their cluster membership as defined in the original paper (see “Methods”). The simultaneous principal curves for the lineages inferred by slingshot are displayed. This Figure is reprinted from Fletcher et al. with permission from the publisher. b Schematic of the cell types and their ordering along the lineages. c Heatmap for the top 200 genes that are associated with the neuronal lineage, as identified with the associationTest procedure from tradeSeq. Five clear gene clusters can be identified, each with a different region of activity during the developmental process. d Four transcription factors discovered by the earlyDETest between the pseudotimes of knots one and three (knots indicated with vertical dashed lines) and that are involved in epithelial cell differentiation.

See this image and copyright information in PMC

References

1. Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 2019;37:547–554. doi: 10.1038/s41587-019-0071-9. - DOI - PubMed
1. Cannoodt R, Saelens W, Saeys Y. Computational methods for trajectory inference from single-cell transcriptomics. Eur. J. Immunol. 2016;46:2496–2506. doi: 10.1002/eji.201646347. - DOI - PubMed
1. Street K, et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018;19:477. doi: 10.1186/s12864-018-4772-0. - DOI - PMC - PubMed
1. Lönnberg T, et al. Single-cell RNA-seq and computational analysis using temporal mixture modelling resolves Th1/Tfh fate bifurcation in malaria. Sci. Immunol. 2017;2:3. doi: 10.1126/sciimmunol.aal2192. - DOI - PMC - PubMed
1. Qiu X, et al. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods. 2017;14:979–982. doi: 10.1038/nmeth.4402. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Trajectory-based differential expression analysis for single-cell sequencing data

Affiliations

Trajectory-based differential expression analysis for single-cell sequencing data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources