Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 16;11(1):3055.
doi: 10.1038/s41467-020-16821-5.

Single-cell lineage tracing by integrating CRISPR-Cas9 mutations with transcriptomic data

Affiliations

Single-cell lineage tracing by integrating CRISPR-Cas9 mutations with transcriptomic data

Hamim Zafar et al. Nat Commun. .

Abstract

Recent studies combine two novel technologies, single-cell RNA-sequencing and CRISPR-Cas9 barcode editing for elucidating developmental lineages at the whole organism level. While these studies provided several insights, they face several computational challenges. First, lineages are reconstructed based on noisy and often saturated random mutation data. Additionally, due to the randomness of the mutations, lineages from multiple experiments cannot be combined to reconstruct a species-invariant lineage tree. To address these issues we developed a statistical method, LinTIMaT, which reconstructs cell lineages using a maximum-likelihood framework by integrating mutation and expression data. Our analysis shows that expression data helps resolve the ambiguities arising in when lineages are inferred based on mutations alone, while also enabling the integration of different individual lineages for the reconstruction of an invariant lineage tree. LinTIMaT lineages have better cell type coherence, improve the functional significance of gene sets and provide new insights on progenitors and differentiation pathways.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of LinTIMaT.
a LinTIMaT reconstructs a cell lineage tree by integrating CRISPR-Cas9 mutations and transcriptomic data. In Step 1, LinTIMaT infers top scoring lineage trees built on barcodes using only mutation likelihood. In Step 2, for all cells carrying the same barcode, LinTIMaT reconstructs a cellular subtree based on expression likelihood. In Step 3, cellular subtrees are attached to barcode lineages to obtain cell lineage trees and the tree with the best combined likelihood is selected. Finally, LinTIMaT uses a hill-climbing search for refining the cell lineage tree by optimizing the combined likelihood (Step 4). b To reconstruct a species-invariant lineage, LinTIMaT first identifies cell clusters that are preserved in all individual lineages and then performs an iterative search that attempts to minimize the distance between individual lineage trees and the invariant tree topology. As part of the iterative process, LinTIMaT matches preserved clusters in one individual tree to preserved clusters in other individual tree(s) such that leaves in the resulting invariant tree contain cells from all individual studies. See Methods for complete details.
Fig. 2
Fig. 2. Benchmarking on C. elegans lineage.
a 16-cell embryo lineage for Caenorhabditis elegans. scRNA-seq data for each leaf (cell) was obtained from and included 6 replicates for each cell. b Comparison of LinTIMaT, Camin-Sokal Maximum Parsimony, and Neighbor-joining when varying the mutation rates. The number of possible mutational states was set to 8. Fixed mutation rate was used for each CRISPR target. Each box plot summarizes results for six replicates with varying simulated CRISPR mutation data and experimental scRNA-seq data. c Comparing lineage reconstruction methods when mutation rate varies between different target sites. d Comparison of accuracy of lineage reconstruction by LinTIMaT, Camin-Sokal Maximum Parsimony, and Neighbor-joining in the presence of mutation dropout. Fixed mutation rate, μ = 0.15 was used for all targets. For bd each box-and-whisker plot summarizes results for six replicates, where the box shows the interquartile range (IQR, the range between the 25th and 75th percentile) with the median value, whiskers indicate the maximum and minimum value within 1.5 times the IQR, also shown are outliers as black dots.
Fig. 3
Fig. 3. Reconstructed cell lineage for a single juvenile zebrafish brain (ZF3) from scGESTALT dataset.
a Adjusted Rand Index (ARI) which measures the agreement between cell types in the tree clusters and cell types assigned by the original paper as a function of the likelihood computed by LinTIMaT. The fact that as the likelihood increases the ARI increases as well indicates that the target function of LinTIMaT is capturing biologically relevant relationships between cells. b Reconstructed cell lineage tree for ZF3 built on 376 cells. Blue nodes represent Cas9-editing events (mutations) and red nodes represent clusters inferred from transcriptomic data. Each leaf node is a cell, represented by a square, and its color represents its assigned cell type as indicated in the legend. The mutated barcode for each cell is displayed as a white bar with insertions (blue) and deletions (red). c By using transcriptomic data LinTIMaT is able to further refine subtrees in which all cells share the same barcode, which can help overcome saturation issues. d, e Example subtrees displaying LinTIMaT’s ability to cluster cells with different barcodes together based on their cell types. In contrast, maximum parsimony puts these on distinct branches.
Fig. 4
Fig. 4. Invariant lineage tree for juvenile zebrafish brain for scGESTALT dataset.
The two-sided tree in the middle represents the invariant lineage tree generated by LinTIMaT by combining the individual trees for ZF1 and ZF3. Blue nodes here represent the clusters from individual fishes (left node: ZF1, right node: ZF3), and red nodes represent the matched invariant clusters. Each leaf node is a cell, represented by a square, and its color represents its cell type as indicated in the legend. Subtrees illustrate examples of invariant clusters preserved in the individual lineage trees.
Fig. 5
Fig. 5. Functional analysis of cell clusters for scGESTALT datasets.
a Heat map of the distribution of cell clusters for each region of the brain (columns). Cell types were classified as belonging to the forebrain, midbrain or hindbrain, and the proportions of cells within each region were calculated for each cluster. Each row sums to 1. Region proportions were colored as shown in key. The leftmost panel shows the heat map for the clusters in ZF1 lineage (subsampled), middle panel shows the heat map for ZF3 lineage and the rightmost panel shows the heat map for the invariant lineage. b Heat map of the p-values (log(pvalue), higher value means more significant) for GO terms for invariant clusters. Adjusted p-values for GO terms were obtained from g:Profiler. P-values are calculated using the hypergeometric distribution. P-values are adjusted using the g:SCS algorithm. Rows represent invariant clusters and columns represent different GO terms (Supplementary Table 8). Yellow, purple and blue columns correspond to GO terms related to neurons, blood and progenitors respectively. The leftmost panel shows the heat map for ZF1, middle panel for ZF3 and the rightmost panel for the invariant tree. As can be seen, the invariant tree correctly combines the unique terms identified for each tree. On one hand, it is able to identify neuron clusters, which are well represented in ZF3 but not in ZF1. On the other hand, it is able to identify progenitor clusters which are not well represented in ZF3.

References

    1. Woodworth MB, Girskis KM, Walsh CA. Building a lineage from single cells: genetic techniques for cell lineage tracking. Nat. Rev. Genet. 2017;18:230. doi: 10.1038/nrg.2016.159. - DOI - PMC - PubMed
    1. Spanjaard B, Junker JP. Methods for lineage tracing on the organism-wide level. Curr. Opin. cell Biol. 2017;49:16–21. doi: 10.1016/j.ceb.2017.11.004. - DOI - PubMed
    1. Kester, L. & van Oudenaarden, A. Single-cell transcriptomics meets lineage tracing. Cell Stem Cell. 23, 166–179 (2018). - PubMed
    1. Naik SH, et al. Diverse and heritable lineage imprinting of early haematopoietic progenitors. Nature. 2013;496:229. doi: 10.1038/nature12013. - DOI - PubMed
    1. Barker N, et al. Identification of stem cells in small intestine and colon by marker gene Lgr5. Nature. 2007;449:1003. doi: 10.1038/nature06196. - DOI - PubMed

Publication types

MeSH terms