Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Feb 7;21(1):31.
doi: 10.1186/s13059-020-1926-6.

Eleven grand challenges in single-cell data science

Affiliations
Review

Eleven grand challenges in single-cell data science

David Lähnemann et al. Genome Biol. .

Abstract

The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.

PubMed Disclaimer

Conflict of interest statement

BJR is a co-founder and consultant at Medley Genomics. FJT reports receiving consulting fees from Roche Diagnostics GmbH and Cellarity Inc., and ownership interest in Cellarity, Inc. and Dermagnostix GmbH. IIM is a co-founder and holds an interest in SmplBio LLC, a company developing cloud-based scRNA-Seq analysis software. No products, services, or technologies of SmplBio have been evaluated or tested in this work. JdR is co-founder of Cyclomics BV. SA is a scientific advisor to Sangamo and Repare Therapeutics. SA and SPS are both founders and shareholders of Contextual Genomics Inc. All other authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Different levels of resolution are of interest, depending on the research question and the data available. Thus, analysis tools and reference systems (such as cell atlases) will have to accommodate multiple levels of resolution from whole organs and tissues over discrete cell types to continuously mappable intermediate cell states, which are indistinguishable even at the microscopic level. A graph abstraction that enables such multiple levels of focus is provided by PAGA [14], a structure that allows for discretely grouping cells, as well as inferring trajectories as paths through a graph
Fig. 2
Fig. 2
Measurement error requires denoising methods or approaches that quantify uncertainty and propagate it down analysis pipelines. Where methods cannot deal with abundant missing values, imputation approaches may be useful. While the true population manifold that generated data is never known, one can usually obtain some estimation of it that can be used for both denoising and imputation
Fig. 3
Fig. 3
Differential expression of a gene or transcript between cell populations. The top row labels the specific gene or transcript, as is also done in Fig. 6. A difference in mean gene expression manifests in a consistent difference of gene expression across all cells of a population (e.g., high vs. low). A difference in variability of gene expression means that in one population, all cells have a very similar expression level, whereas in another population, some cells have a much higher expression and some a much lower expression. The resulting average expression level may be the same, and in such cases, only single-cell measurements can find the difference between populations. A difference across pseudotime is a change of expression within a population, for example, along a developmental trajectory (compare Fig. 1). This also constitutes a difference between cell populations that is not apparent from population averages, but requires a pseudo-temporal ordering of measurements on single cells
Fig. 4
Fig. 4
A tumor evolves somatically—from initiation to detection, to resection, and to possible metastasis. New genomic mutations can confer a selective advantage to the resulting new subclone that allows it to outperform other tumor subclones (subclone competition). At the same time, the acting selection pressures can change over time (e.g., due to new subclones arising, the immune system detecting certain subclones, or as a result of therapy). Understanding such selective regimes—and how specific mutations alter a subclone’s susceptibility to changes in selection pressures—will help construct an evolutionary model of tumorigenesis. And it is only within this evolutionary model that more efficient and more patient-specific treatments can be developed. For such a model, unambiguously identifying mutation profiles of subclones via scDNA-seq of resected or biopsied single cells is crucial
Fig. 5
Fig. 5
Mutations (colored stars) accumulate in cells during somatic cell divisions and can be used to reconstruct the developmental lineages of individual cells within an organism (leaf nodes of the tree with mutational presence/absence profiles attached). However, insufficient or unbalanced WGA can lead to the dropout of one or both alleles at a genomic site. This can be mitigated by better amplification methods, but also by computational and statistical methods that can account for or impute the missing values
Fig. 6
Fig. 6
Approaches for integrating single-cell measurement datasets across measurement types, samples, and experiments, as also described in Table 4. 1S: clustering of cells from one sample from one experiment requires no data integration. +S: integration of one measurement type across samples requires the linking of cell populations/clusters. +X+S: integration of one measurement type across experiments conducted in separate laboratories requires stable reference systems like cell atlases (compare Fig. 1). +M1C: integration of multiple measurement types obtained from the same cell highlights the problem of data sparsity of all available measurement types and the dependency of measurement types that needs to be accounted for. +M+C: integration of different measurement types from different cells of the same cell population requires special care in matching cells through meaningful profiles. +all: one possibility for easing data integration across measurement types from separate cells would be to have a stable reference (cell atlas) across multiple measurement types, capturing different cell states, cell populations, and organisms. Effectively, this combines the challenges and promises of the approaches +X+S, +M1C, and +M+C

References

    1. Nature Methods Method of the year 2013. Nat Methods. 2014;11(1):1–1. doi: 10.1038/nmeth.2801. - DOI - PubMed
    1. Anchang B, Hart TDP, Bendall SC, Qiu P, Bjornson Z, Linderman M, Nolan GP, Plevritis SK. Visualization and cellular hierarchy inference of single-cell data using SPADE. Nat Protocol. 2016;11(7):1264–79. doi: 10.1038/nprot.2016.066. - DOI - PubMed
    1. Francis JM, Zhang C-Z, Maire CL, Jung J, Manzo VE, Adalsteinsson VA, Homer H, Haidar S, Blumenstiel B, Pedamallu CS, Ligon AH, Love JC, Meyerson M, Ligon KL. EGFR variant heterogeneity in glioblastoma resolved through single-nucleus sequencing. Cancer Discov. 2014;4(8):956–71. doi: 10.1158/2159-8290.CD-13-0879. - DOI - PMC - PubMed
    1. Lawson DA, Kessenbrock K, Davis RT, Pervolarakis N, Werb Z. Tumour heterogeneity and metastasis at single-cell resolution. Nat Cell Biol. 2018;20(12):1349. doi: 10.1038/s41556-018-0236-7. - DOI - PMC - PubMed
    1. Regev A, Teichmann SA, Lander ES, Amit I, Benoist C, Birney E, Bodenmiller B, Campbell P, Carninci P, Clatworthy M, Clevers H, Deplancke B, Dunham I, Eberwine J, Eils R, Enard W, Farmer A, Fugger L, Göttgens B, Hacohen N, Haniffa M, Hemberg M, Kim S, Klenerman P, Kriegstein A, Lein E, Linnarsson S, Lundeberg J, Majumder P, Marioni JC, Merad M, Mhlanga M, Nawijn M, Netea M, Nolan G, Pe’er D, Phillipakis A, Ponting CP, Quake S, Reik W, Rozenblatt-Rosen O, Sanes J, Satija R, Schumacher TN, Shalek A, Shapiro E, Sharma P, Shin JW, Stegle O, Stratton M, Stubbington MJT, Oudenaarden AV, Wagner A, Watt F, Weissman J, Wold B, Xavier R, Yosef N, et al.The Human Cell Atlas. 2017. 10.1101/121202. Accessed 27 Mar 2019.

Publication types

LinkOut - more resources