Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul;571(7765):355-360.
doi: 10.1038/s41586-019-1367-0. Epub 2019 Jul 3.

Somatic mutations and cell identity linked by Genotyping of Transcriptomes

Affiliations

Somatic mutations and cell identity linked by Genotyping of Transcriptomes

Anna S Nam et al. Nature. 2019 Jul.

Abstract

Defining the transcriptomic identity of malignant cells is challenging in the absence of surface markers that distinguish cancer clones from one another, or from admixed non-neoplastic cells. To address this challenge, here we developed Genotyping of Transcriptomes (GoT), a method to integrate genotyping with high-throughput droplet-based single-cell RNA sequencing. We apply GoT to profile 38,290 CD34+ cells from patients with CALR-mutated myeloproliferative neoplasms to study how somatic mutations corrupt the complex process of human haematopoiesis. High-resolution mapping of malignant versus normal haematopoietic progenitors revealed an increasing fitness advantage with myeloid differentiation of cells with mutated CALR. We identified the unfolded protein response as a predominant outcome of CALR mutations, with a considerable dependency on cell identity, as well as upregulation of the NF-κB pathway specifically in uncommitted stem cells. We further extended the GoT toolkit to genotype multiple targets and loci that are distant from transcript ends. Together, these findings reveal that the transcriptional output of somatic mutations in myeloproliferative neoplasms is dependent on the native cell identity.

PubMed Disclaimer

Conflict of interest statement

COMPETING INTERESTS The authors declare no competing interests.

Figures

Extended Data Figure 1.
Extended Data Figure 1.. Comparison of variant allele frequency (VAF) between whole exome sequencing (WES) and RNA-seq and primer sequences and positions of linear and circularization GoT.
a, Pie charts show the fraction of variants that are categorized as described in the top-left-box. Distribution of mutant allele fraction is annotated as oncogene or tumor suppressor gene (definitions according to Vogelstein et al. 2013 and Bailey et al. 2018). Diagonal dashed lines indicate equal allelic fraction between WES and RNA-seq. Yellow density contours represent driver distributions. BRCA, breast invasive carcinoma; HNSC, head and neck squamous cell carcinoma; KIRC, kidney renal clear cell carcinoma; LUAD, lung adenocarcinoma; STAD, stomach adenocarcinoma. b, Schematic localization of primers for (linear) GoT and circularization GoT for 3’ and 5’ libraries. c, Primer positions and sequences of the regions targeted by GoT and circularization GoT.
Extended Data Figure 2.
Extended Data Figure 2.. Optimization of parameters in targeted amplicon sequence processing pipeline in Genotyping of Transcriptomes (IronThrone GoT).
a, Representation of amplicon reads. b, Flow chart of the GoT analysis pipeline (see Methods). c, Murine (green) vs. human (blue) genome alignment of 10x data (y-axis) with genotyping data by GoT (x-axis) with various thresholds for minimum duplicate reads (across) and maximum mismatch ratio (down). d, Results of precision, recall and F1 score analysis for combinations of minimum duplicate reads and maximum mismatch ratios. e, Measure of importance of each variable used for the calculation of splits in trees in random forest classification test. f, Ratio of cell loss and genotyping errors (Z-score in y-axis) based on mismatch ratio thresholds (x-axis); area of intersection is highlighted with gray around the mismatch ratio 0.2. g, Heatmaps showing Z-scores of number of filtered cells (left) and predicted error rates (right) from random forest classification tests for combinations of minimum duplicate reads and maximum mismatch ratio thresholds.
Extended Data Figure 3.
Extended Data Figure 3.. GoT captures genotyping information of single cells through cDNA.
a, Percentage of cells by number of UMIs with CALR mutation locus capture in standard 10x data (left) and GoT data (right) (see Extended Data Fig. 3c for cell number in each sample). b, Number of UMIs per cell of CALR transcript from standard 10x data (left, blue shade) or targeted CALR locus from standard 10x or GoT (pink shade, see Extended Data Fig. 3c for cell number in each sample). c, Summary of clinical, pathologic, and GoT data from patients with CALR-mutated myeloproliferative neoplasms. BM, bone marrow; PB, peripheral blood. d, Number of genes per cell (left) and number of UMIs per cell (right) from published standard 10x data of healthy control CD34+ cells and 10x data from 3’ v2 chemistry of CD34+ cells from patient samples that underwent concurrent GoT, after random down-sampling of the reads from each sample to 50 million reads x3 iterations, showing that extra cycle of PCR and portioning a small aliquot from the 10x cDNA library for GoT using 3’ v2 chemistry does not compromise scRNA-seq data.
Extended Data Figure 4.
Extended Data Figure 4.. Integration of ET patient samples and progenitor subset assignment.
a, t-SNE projection of CD34+ progenitor cells from samples ET01-ET05, after integration and batch-correction using the Seurat package (see Methods). b, Heatmap of top ten differentially expressed genes for clusters; lineage-specific genes from Velten et al. (2018) are highlighted (see Methods). c, Representative lineage-specific genes projected on t-SNE representation of CD34+ cells from ET patient samples. d, t-SNE projection of CD34+ cells from patient samples ET01-ET05 after applying deep generative modeling approach for the single cell analysis using the scVI package (see Methods) showing progenitor subset assignments as determined after clustering the cells using the Seurat package. e, Genotyping data from GoT are projected onto the t-SNE generated after the scVI analysis of progenitor cells from ET01-ET05. Cells without any GoT data are labeled NA (not assignable). WT, wildtype; MUT, mutant; HSPC, hematopoietic stem progenitor cells; IMP, immature myeloid progenitors; NP, neutrophil progenitors; M/D, monocyte-dendritic cell progenitors; E/B/M, eosinophil, basophil, mast cell progenitors; MEP, megakaryocytic-erythroid progenitors; MkP, megakaryocytic progenitors; EP, erythroid progenitors; PreB, precursor B-cells.
Extended Data Figure 5.
Extended Data Figure 5.. Results of GoT analysis is robust to various amplicon UMI thresholds and linear modeling.
a, Wildtype and mutant cell frequency in HSPCs vs. MkPs with variable minimum genotyping UMI thresholds (Fisher’s exact test, two-sided, see Supplementary Table 6 for sample size). b, Pseudotime comparison between wildtype and mutant cells with increasing number of thresholds for targeted genotyping UMI (t-test, two-sided, Supplementary Table 6). c, Pseudotime comparison between mutant and wildtype cells with UMI threshold of 1 (Extended Data Figure 5b) with statistical test using a generalized linear model including mutation status and total number of amplicon UMIs per cell. d, Across 100 iterations, the genotyping amplicon UMIs were downsampled to one per cell and mutant cell frequency was determined for MkPs or precursor B-cells (PreB). This frequency was then divided by the total mutant cell frequency across all progenitor subsets for each of the 100 iterations. Mean ± standard deviation (SD) after n = 100 down-sampling iterations (Wilcoxon rank-sum test, two sided). ET samples with at least 20 cells in each cluster were analyzed. e, Variant allele fraction of CALR mutation in CD34+, CD38 (left), CD34+, CD38+ (middle) and CD34+, CD10+ (right) FACS-sorted peripheral blood cells from patients with ET determined by droplet digital (dd) PCR.
Extended Data Figure 6.
Extended Data Figure 6.. Cell cycle module expression in mutant and wildtype progenitor cells.
a, S phase and G2M phase gene module expression in wildtype (wildtype) vs. mutant (mutant) cells in HSPC and MkP clusters from ET patient samples. Cell cycle module score represents sum of S phase and G2M phase gene module expression (Wilcoxon rank-sum test, two-sided, see Methods and Supplementary Table 6 for sample size). Analysis performed for clusters with at least 20 cells. b, Ratio of committed progenitor priming module expression of mutant and wildtype hematopoietic stem progenitor cells (HSPCs). One mutant and one wildtype HSPCs were randomly sampled from ET01-ET05 for each round of analysis (n = 1000 iterations, Wilcoxon-rank sum test, two-sided).
Extended Data Figure 7.
Extended Data Figure 7.. ATF6 and IRE1 branches of the unfolded protein response are activated in CALR-mutated progenitor cells.
By samples, expression of ATF6-, PERK- and XBP1-target genes in the unfolded protein response in CALR wildtype and mutant MkPs and HSPCs (Wilcoxon rank-sum test, two-sided).
Extended Data Figure 8.
Extended Data Figure 8.. CALR-mutated hematopoietic progenitor cells from myelofibrosis show upregulation of the IRE1-unfolded protein response.
a, t-SNE projection of CD34+ progenitor cells from samples MF01-MF04, after integration and batch-correction using the Seurat package (see Methods, n = 11,093). b, t-SNE projection of CD34+ progenitor cells from samples MF01-MF04 labeled with pseuodotime (left, n = 11,093). Pseudotime comparison between wildtype (n = 2221) and mutant (n = 7483) cells. P-values from likelihood ratio tests of linear mixed model (LMM) with genotype as fixed effect and individual patient samples as random effect, against the model without the genotype effect (see Methods). c, Cell cycle module score comparison between wildtype and mutant cells in MF patients (Wilcoxon rank-sum test, two-sided). d, Ratio of TGFβ signaling pathway gene expression of mutant and wildtype MkPs. One mutant and one wildtype MkPs were randomly sampled for each round of analysis (n = 100 iterations; Wilcoxon-rank sum test, two-sided). e, Differentially expressed genes between wildtype MkPs with high cell cycle expression (n = 220) vs. wildtype MkPs with low cell cycle expression (n = 110) common across patient samples MF02-MF04. P-values were combined using Fisher combine test with Benjamini-Hochberg adjustments. Weighted average of log2(fold change) based on cell number across samples is shown (see Methods).
Extended Data Figure 9.
Extended Data Figure 9.. Deciphering subclonal progenitor identities using multiplex GoT and targeting loci distant from transcript ends using circularization GoT.
a, Single-cell cloning assay of peripheral blood cells from MF05 patient (see Methods). b, Rate of targeted locus capture (%) as a function of gene expression and distance of targeted locus from the transcript ends. c, Distance of mutation locus from transcript ends for pan-cancer drivers and their frequencies based on the number of times reported in the Catalogue of Somatic Mutations in Cancer (COSMIC) database. Mutations are annotated as oncogenes, tumor suppressor genes, or passengers (as defined in Vogelstein et al. 2013 and Bailey et al. 2018). Relative density of each subclass of mutations from the closer end (i.e. 3’ or 5’) is shown in the upper panel. d, Schematic of analysis of Oxford Nanopore Technology (ONT) sequencing reads. e, Frequency of SF3B1 mutant and wildtype reads of linear GoT amplicon library sequenced with ONT. f, Analysis of SF3B1 amplicon reads sequenced by Oxford Nanopore Technology (ONT) for inter-transcript PCR recombination by mapping 50 bps at the opposite end of the targeted locus showing only 2.2% of fragments that reflect inter-transcript recombination. g, Pairwise difference of read lengths for duplicate reads (i.e. reads with the same CB+UMI barcodes) of SF3B1 amplicon library sequenced with ONT, showing consistent read length of duplicates supporting a low rate of intra-transcript PCR recombination. h, Comparison of genotype assignment for CALR in patient sample MF01 between linear GoT and circularization GoT after downsampling reads to 300K with 10 iterations (n = 320 cells). i, Comparison of CALR-mutant UMI fraction per cell in patient sample MF01 between linear GoT and circularization GoT after downsampling reads to 300K with 10 iterations (n = 320 cells, Pearson’s correlation, F-test).
Extended Data Figure 10.
Extended Data Figure 10.. Evaluation of barcode replacement in IronThrone GoT processing.
a, Fraction of reads with cell barcodes (CB) that are not perfectly matched to the whitelist CB from the species-mixing experiment. >Hamm-1, filtered reads with barcodes that are greater than 1-Hamming distance away from whitelisted barcodes (n = 139,422 reads); Not significant, filtered reads with barcodes that are 1-Hamming distance away from the whitelisted barcodes but no statistical significance (posterior probability < 0.99, n = 14830 reads); replaced, rescued reads with barcodes that have candidates of 1-Hamming distance away from the whitelisted barcodes with statistical significance (posterior probability ≥ 0.99, n = 224085 reads). b, Number of supporting reads per candidate barcode and base quality at the differing base positions and c, across base positions. Wilcoxon rank-sum tests (two-sided) were applied to compare not replaced (n = 14,830) and replaced (n = 224,085) barcodes. d, Correlation between the number of supporting reads per candidate barcode and median base quality at the differing base (two-tailed Pearson’s correlation, F-test). e, Distribution of prior and posterior probabilities from not significant (n = 14,830) and replaced (n = 224,085) barcodes. The dashed red line represents the posterior probability cutoff (0.99). f-h, To further evaluate the efficiency of barcode replacement, we generated synthetic CB by randomly changing one base in whitelist CBs (n = 100 iterations). f, Fraction of reads with CB that are not identical to the whitelist CB (n = 100 iterations). Percentage of replaced reads were 99.1% ± 0.001% (median ± absolute deviation) in 1-base changed, 1.1% ± 0.002% in 2-bases changed, and 0.7 ± 0.001% in 3-bases changed simulations. g, Determination of whether replaced CB are identical to the original CB. In 1-base change simulations, percentage of reads with replaced CB that were identical to the original CB was 100.0 ± 0.0% (median ± absolute deviation of 100 iterations). h, Estimation of prediction power for classifying CB from 1-base changed simulations (n = 100 iterations).
Figure 1.
Figure 1.. GoT provides somatic mutation genotyping for thousands of cancer cells and reveals differential fitness impact of CALR mutation in hematopoietic progenitor subsets.
a, Schematic of GoT workflow. b, Species-mixing study with mutant CALR murine cells and wildtype CALR human cells. 10x reads from singlet cells map to human or murine genome (left). Murine vs. human genome alignment of 10x data (y-axis) and GoT data (x-axis, right, n = 1,259 cells). c, FACS of CD34+ cells (left) and UMI per cell (right) for CALR transcript (blue shade) or targeted locus (pink shade) from representative ET01 (n = 6811 cells) of 10 independent experiments (Extended Data Fig. 3a,b for replicates). d, t-SNE projection of CD34+ cells from ET patients with cluster assignment and e, genotyping data. f, Normalized mutant cell frequency (Methods). Bars show aggregate analysis of ET01-ET05 with mean±SD of 100 downsampling iterations to 1 genotyping UMI; points represent mean of n = 100 downsampling iterations for each sample. g, Normalized mutant cell frequency; mean±SD of n = 100 downsampling iterations (Wilcoxon rank-sum test, two-sided). h, t-SNE projection of ET CD34+ cells with pseudotime (left) and density plot of wildtype and mutant cells (right). i, Pseudotime in wildtype vs. mutant cells. P-value from likelihood ratio test of LMM with/without mutation status (Methods). j, Bulk VAF of CALR mutation in FACS-sorted cells from ET patients by droplet digital (dd) PCR. HSPC, hematopoietic stem progenitor cells; IMP, immature myeloid progenitors; NP, neutrophil progenitors; M/D, monocyte-dendritic cell progenitors; E/B/M, eosinophil, basophil, mast cell progenitors; MEP, megakaryocytic-erythroid progenitors; MkP, megakaryocytic progenitors; EP, erythroid progenitors; PreB, precursor B-cells; cc, cell cycle; WT, wildtype. MUT, mutant; NA, not assignable. In all figures, box plots represent the median, bottom and upper quartiles, whiskers correspond to 1.5x the interquartile range; violin plots depict kernel density estimates to show the density distribution.
Figure 2.
Figure 2.. CALR mutations result in higher proliferative impact on MkPs compared to HSPCs.
a, Expression of representative genes upregulated in JAK2-mutated ET cells in CALR wildtype (n = 157) vs. mutant (n = 85) MkPs from representative ET01. b, Heatmap of -log10(P-value) from comparisons of expressions of JAK2-mutated ET genes between mutant and wildtype cells (Supplementary Table 6). c, Cell cycle module expression in HSPCs (n = 108 wildtype vs. n = 240 mutant) and MkPs (n = 25 wildtype vs. n = 276 mutant) from ET03 (Extended Data Fig. 6a). d, Platelet counts vs. difference of mean cell cycle score (± SE) between wildtype and mutant MkPs (n = 5 samples; F-test). e, Expression of MkP and HSC modules in MkPs from ET03. Pie charts of wildtype vs. mutant cell frequencies in HSCloMkPhi (n = 121 cells) and HSChiMkPlo (n = 28 cells; Fisher’s exact test, two-sided). f, Expression of TGFβ and MkP modules in HSPCs from ET01 and cell cycle score in HSPCs in MkPhiTGFβlo (n = 127 wildtype vs. n = 41 mutant) and MkPloTGFβhi populations (n = 105 wildtype vs. n = 15 mutant). P-values for a, b, c, f determined by Wilcoxon rank-sum test, two-sided.
Figure 3.
Figure 3.. CALR mutation transcriptional effects are cell identity dependent.
a, Differentially expressed genes between mutant vs. wildtype MkPs and b, between mutant vs. wildtype HSPCs across ET01-ET05 samples (Supplementary Table 6). P-values combined using Fisher combine test with Benjamini-Hochberg adjustment. Key gene set enrichments (hypergeometic test, Methods). c, Expression of genes upregulated in UPR branches in MkPs (n = 442 wildtype vs. n = 640 mutant) and HSPCs (n = 1704 wildtype vs. n = 613 mutant) from ET01-ET05. P-values from likelihood ratio tests of LMM with and without mutation status (Methods). d, Schematic of GoT applied to XBP1 splice site. e, Local regression of total and spliced XBP1 (XBP1s) expression in progenitor cells from samples ET03 and ET04 (n = 1308 wildtype, and n = 1514 mutant; shade: 95% CI, left). XBP1s to unspliced XBP1 ratio in MkPs (n = 115 wildtype vs. n = 248 mutant) and HSPCs (n = 489 wildtype vs. n = 302 mutant; right panel). f, Expression of NF-κB pathway and anti-apoptosis genes in HSPC1 (n = 116 wildtype vs. n = 27 mutant), HSPC2 (n = 365 wildtype vs. n = 53 mutant), and HSPC3 (n = 381 wildtype vs. n = 105 mutant) from ET01. P-values for panels e, f from Wilcoxon rank-sum test, two-sided.
Figure 4.
Figure 4.. CALR mutation effects on hematopoietic progenitor cells from MF patients.
a, t-SNE projection of CD34+ cells from MF patients showing cluster assignment and b, genotyping data from GoT. c, Normalized frequency of mutant cells (Methods). Bar graphs represent aggregate analysis of MF01-MF04 showing mean ± SD of 100 downsampling iterations to 1 genotyping UMI; gray points represent mean of 100 down-sampling iterations for each sample. d, t-SNE projection of the CD34+ cells showing cell cycle gene expression (left) and density plot of mutant and wildtype cells (right). Density plots of mutant vs. wildtype cells along cell cycle gene expression (inset, Wilcoxon rank-sum test, two-sided; Supplementary Table 6). e, Differentially expressed genes in mutant vs. wildtype MkPs across samples MF01-MF04 (Supplementary Table 6). P-values combined using Fisher combine test with Benjamini-Hochberg adjustment. Key gene set enrichments (hypergeometic test, Methods).
Figure 5.
Figure 5.. GoT dissects subclonal identity through multiplexing and targets loci distant from transcript ends via circularization.
a, Schematic of clonal evolution of neoplastic cells from MF05 (top-left). t-SNE projections of CD34+ cells with cluster assignments (top-right) and with with GoT data for each variant (bottom). b, Cell cycle score in subclonal MEP populations (n = 28 single mutant, 109 double mutant, 293 triple mutant cells). c, Schematic of circularization GoT. d, t-SNE projection of MF05 CD34+ cells GoT data for SF3B1 from circularization and linear GoT. e, UMI per cell of SF3B1 gene (blue shade) or targeted SF3B1 locus (pink shade) from 10x, linear GoT sequenced on Illumina, circularization GoT, and linear GoT sequenced with ONT (n = 8475 cells). f, Mixing study with human JAK2 wildtype cDNA from TF-1 and homozygous JAK2 V617F cDNA from HEL. Frequency of reads (wildtype, V617F or not assignable [NA]) assigned to TF-1 or HEL cell barcodes (CB). g, t-SNE projection of CD34+ cells from patient with JAK2 V617F ET showing cluster assignment (left) and genotyping information (right) based on GoT data. h, Normalized frequency of mutant cells within the progenitor clusters (Methods). Mean ± SD of n = 100 down-sampling iterations. i, Density plots of HSPCs along lineage priming modules (n = 17 wildtype vs. 15 mutant cells). P-values for b, h, i from Wilcoxon rank-sum test, two-sided.

Comment in

References

    1. Sperling AS, Gibson CJ & Ebert BL The genetics of myelodysplastic syndrome: from clonal haematopoiesis to secondary leukaemia. Nat Rev Cancer 17, 5–19 (2017). - PMC - PubMed
    1. Landau DA, et al. The evolutionary landscape of chronic lymphocytic leukemia treated with ibrutinib targeted therapy. Nat Commun 8, 2185 (2017). - PMC - PubMed
    1. Burger JA, et al. Clonal evolution in patients with chronic lymphocytic leukaemia developing resistance to BTK inhibition. Nat Commun 7, 11589 (2016). - PMC - PubMed
    1. Landau DA, et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell 152, 714–726 (2013). - PMC - PubMed
    1. Landau DA, et al. Mutations driving CLL and their evolution in progression and relapse. Nature 526, 525–530 (2015). - PMC - PubMed

MeSH terms