Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Dec 31:10.1158/2159-8290.CD-24-0853.
doi: 10.1158/2159-8290.CD-24-0853. Online ahead of print.

Genotype-to-phenotype mapping of somatic clonal mosaicism via single-cell co-capture of DNA mutations and mRNA transcripts

Affiliations

Genotype-to-phenotype mapping of somatic clonal mosaicism via single-cell co-capture of DNA mutations and mRNA transcripts

Dennis J Yuan et al. Cancer Discov. .

Abstract

Somatic mosaicism is pervasively observed in human aging, with clonal expansions of cells harboring mutations in recurrently mutated driver genes. Bulk sequencing of tissues captures mutation frequencies, but cannot reconstruct clonal architectures nor delineate how driver mutations impact cellular phenotypes. We developed single-cell Genotype-to-Phenotype sequencing (scG2P) for high-throughput, highly-multiplexed, joint capture of genotyping of mutation hotspots and mRNA markers. We applied scG2P to aged esophagus samples from six individuals and observed large numbers of clones with a single driver event, accompanied by rare clones with two driver mutations. NOTCH1 mutants dominate the clonal landscape and are linked to stunted epithelial differentiation, while TP53 mutants promote clonal expansion through both differentiation biases and increased cell cycling. Thus, joint single-cell highly multiplexed capture of somatic mutations and mRNA transcripts enables high resolution reconstruction of clonal architecture and associated phenotypes in solid tissue somatic mosaicism.

PubMed Disclaimer

Conflict of interest statement

Conflicts-of interest:

D.J.Y. has received travel support from Ultima Genomics and Bioskryb Genomics, outside of this work. D.A.L. serves on the Scientific Advisory Board of Mission Bio, Pangea, Alethiomics, Montage, Ultima and Veracyte. D.A.L. has received prior research funding from 10x Genomics, Illumina, and Ultima Genomics unrelated to the current manuscript. D.D. and S.W. are employees of Mission Bio. D.D. is listed as an inventor on a granted patent (US patent 11365441) and a submitted patent (US patent application 16/839,057). S.W. is listed as an inventor on a submitted patent (US patent application 16/936,378). No other authors report competing interests.

Figures

Figure 1.
Figure 1.. Targeted capture of mutation hotspots and RNA in single cells
A) Schematic representation of the scG2P workflow. Dissociated cells or nuclei undergo cell lysis and targeted reverse transcription (RT), followed by barcode addition and targeted loci amplification in two sequential encapsulations. DNA and RNA amplicons are separated by streptavidin bead capture for separate library preparation. This workflow combines DNA mutation hotspot capture to reconstruct clonal architecture with exon-exon capture of RNA targets for single-cell genotype to phenotype linkage (created with BioRender.com). B) Previously reported mutations per codon and tiling amplicon coverage across NOTCH1 (top) and TP53 (bottom) protein position. The number of mutations per codon, as previously reported from bulk sequencing of the esophagus (Yokoyama et al.(8)), is displayed across the protein positions in the upper plots. A 15-bp rolling window is used to determine the number of mutations per codon. In the middle plots, black bars represent the presence of an amplicon covering the locus in the scG2P panel, while grey bars represent the absence of an amplicon covering the locus. Domains for NOTCH1 and TP53 are indicated below (EGF = Epidermal Growth Factor-like repeats, LNR = LIN12/Notch Repeat, ANK = Ankyrin Repeats, TAD = transactivation domain, DBD = DNA binding domain, OLD = oligomerization domain). The percentage of GC content across 15-bp windows is displayed in black (lower plots). Red lines indicate GC content of the genotyping amplicons corresponding to the regions captured by the panel. Grey dashed lines represent the mean GC content across all amplicons in the gene. Amplicon designs for FAT1, NOTCH2, NOTCH3, and PPM1D are provided in Supplementary Figure 1B. C) Heatmap of filtered variants detected in mixing study. Cell lines HCT116, KYSE270, and KYSE410 were mixed and processed using scG2P. The heatmap displays the detected filtered variants for each individual cell, clustered by cell line based on the variant allele frequencies (VAFs) of the DNA variants. Variants annotated in red were independently validated using whole-exome sequencing data from the Cancer Cell Line Encyclopedia (CCLE) database. The genotype of each variant is indicated as homozygous (HOM), heterozygous (HET), wild-type (WT), or missing. D) (Left) RNA expression-based uniform manifold approximation projection (UMAP) for cell line mixing experiment. HCT116 (n = 234 cells), KYSE270 (n = 403 cells), and KYSE410 (n = 355 cells) are colored according to their assigned cell line identity, determined by k-means clustering of the variant allele frequencies of DNA variants. (Right) Violin plots displaying the RNA expression levels (centered log ratio) of four marker genes (KRT23, KRT5, KRT7, and EPCAM) across the three cell lines. E) Confusion matrix comparing RNA-based clustering labels (predicted) and DNA-based clustering labels (ground truth) of the cell line mixing study. The matrix displays the percentage of cells assigned to each cell line based on RNA expression profiles compared to the ground truth DNA-based assignments. Diagonal elements represent correctly classified cells, while off-diagonal elements indicate misclassifications. The mean accuracy across all cell lines is 0.95.
Figure 2.
Figure 2.. Single-cell mutational landscape in aging esophagus
A) Bar plot displaying the number of nonsynonymous mutations detected in each driver gene across all donor samples. B) Bar plot showing the contribution of nonsynonymous mutations by each driver gene for each donor sample. C) Illustration of the clonal structure map and clone fractions detected in ESO-5 using single-cell genotyping data. Terminal nodes (colored circles) represent distinct subclones defined by distinct variants across driver genes. Circle size represents the cell fraction of each subclone, ordered by size. Branch lengths are scaled to reflect the acquisition of a single mutation, displaying single mutant and double mutant (n = 2) clones. D) Fraction of cells harboring 0, 1, or 2 mutations per clone for cells passing the 50% genotyping completeness threshold for each donor sample. E) Fraction of cells with mutation in indicated driver gene, or combination of mutations in driver genes, for each donor sample.
Figure 3.
Figure 3.. Mutant driver gene clonal architecture of the aging esophagus
A) Quantification of the clonal fraction for each clone detected across five donors. Circles represent the clones, colored by individual driver mutation or double mutation, that are detected within each donor (x axis, random order within sample). The y-axis represents the fraction of total cells that each clone constitutes within each sample (number of mutant cells in a clone divided by total number of cells in a sample). B) (Top) Scatter plot illustrating the Pearson correlation between the proportion of wild-type cells and donor age (R = 0.84). (Bottom) Scatter plot illustrating the Pearson correlation between clonal diversity, computed by the Shannon entropy index, and donor age (R = 0.48). Each point represents a donor, and the lines represent the linear fit. C) Clone fractions of cells with single driver mutation. Box plot display the distribution of mean mutant cell fractions of clones for each mutated gene across all donors (Center line, median; box, IQR; whisker, 1.5*IQR). Only cells with single mutations were included. D) Fraction of cells with mutation in the NOTCH1 (top) and TP53 (bottom) genes for ESO(–5). The cell fractions (left y-axis) of detected variants for each sample are plotted along the protein positions of NOTCH1 and TP53. Each variant is annotated with its coding impact (missense, synonymous, in frame, or frameshift mutation). The mutant cell fraction plot is overlaid with the hotspot mutation density, representing the number of mutations per codon (right y-axis) as previously reported by Yokoyama et al.(8) from bulk sequencing of the esophageal microdissections. NOTCH1 and TP53 domains are indicated with annotations from UniProt (EGF = Epidermal Growth Factor-like repeats, LNR = LIN12/Notch Repeat, ANK = Ankyrin Repeats, TAD = transactivation domain, DBD = DNA binding domain, OLD = oligomerization domain). The genotyping efficiency is displayed below each plot using a color scale that indicates the proportion of cells genotyped at each locus. Note that fraction of cells with mutation is represented by the clone fraction in all single-mutant clones, whereas in 2 out of 5 donors with detected double mutant clones, the mutant fraction is the sum of the parent clone and double mutant subclone fractions.
Figure 4.
Figure 4.. Mapping cellular phenotypes to esophageal clones
A) (Left) Uniform manifold approximation and projection (UMAP) of single cells from ESO-(–5), clustered using RNA expression, colored by annotated cell types. (Right) Dot plot displaying cell type marker genes (x-axis) used for cell type annotation (y-axis). Dot size represents the percentage of cells expressing the marker gene, and color scale indicates the mean expression level (centered log ratio) within each cell type. B) Violin plots comparing cell cycling scores (left) and differentiation scores (right) across assigned cell types. Cell cycle and differentiation scores were calculated as gene-module scores based on cycling and differentiation gene sets expression from the RNA panel (Supplementary Tables 9,10). C) Diffusion map of epithelial single cells merged from ESO-1, ESO-2, ESO-3, and ESO-5, annotated by cell types (left) and overlaid with trajectory scores (right). The trajectory scores represent the differentiation stage of each cell along the inferred pseudotime trajectory. Fibroblasts from A were excluded from this analysis. D) Clonal composition (fraction of mutant clones comprising the total sample) calculated across the pseudotime diffusion map to assess the relative proportion of clones with different driver mutations throughout differentiation stages (from early to late) for ESO-5 sample. Clone fractions are summed by driver gene mutation and the relative abundance is plotted against pseudotime quantiles (from C) representing the differentiation trajectory.
Figure 5.
Figure 5.. Esophageal epithelium is colonized by diverse somatic mutant clones with phenotypic biases
A) Mean normalized differentiation scores calculated from the differentiation module score (y axis) capturing esophageal epithelial differentiation stages (Methods) of cells assigned to clones based on driver gene variant for each donor sample (x axis, random order within sample). Dot size represents the clone fraction (number of mutant cells in a clone divided by total number of cells in a sample) and color indicates the mutant driver gene. The differentiation scores are normalized through min-max scaling within each sample to allow for comparison across samples. B) Mean differentiation and cycling scores of cells assigned to clones with single driver mutation, aggregated by mutant driver genes across all samples. Each point represents aggregation of cells with mutation in a specific driver gene, with error bars indicating the standard error of the mean (SEM) across cells. C) Clones from the ESO-5 sample projected onto the differentiation and cell cycling score axes (mean score of cells assigned to clones). Each dot represents a clone, with the dot size proportional to the clone fraction and color indicating the mutant driver gene. WT clone fraction is fixed at 0.1 for visualization purposes. True WT proportions represented in Fig. 2D. D) Diffusion map from Fig. 4C overlaid with kernel density estimates over those dimensions for cells with TP53:p.R135W mutation. E) Volcano plot comparing differentially expressed transcripts between TP53-mutant cells and wild-type cells. Horizontal dotted line represents FDR < 0.05; vertical dotted lines represent average log2 fold change (log2FC) > 0.15. F) Clones from the ESO-5 sample projected according to their mean expression of KLF5 and TP63. Dot size represents the clone fraction, and color indicates the mutant driver gene. WT cells are fixed at clone fraction of 0.1 and used to visualize as comparison to clone scores.
Figure 6.
Figure 6.. Uncovering clones with NOTCH1 loss of heterozygosity
A) Illustration of the clonal SNV structure map of ESO-6 generated by identifying driver gene variants from the single-cell genotyping data and determining mutant cell fraction. Terminal nodes (colored circles) represent distinct subclones defined by distinct variants across driver genes. Circle size represents the cell fraction of each subclone. Branch lengths are scaled to reflect the acquisition of a single mutation, displaying single mutant and double mutant clones. B) Germline NOTCH1 SNPs (n = 5, X-axis) are genotyped to define loss of heterozygosity at NOTCH1 across clones (NOTCH1 SNP VAF). In addition, SNV VAF is shown for each mutation within the indicated clone. C) Clones from the ESO-6 sample projected onto the differentiation and cell cycling score axes (mean score of cells assigned to clones). Each dot represents a clone, with the dot size proportional to the clone fraction, shape indicating NOTCH1 LOH status, and color indicating the mutant driver gene. Specific clones are highlighted with bold borders: the NOTCH1 single mutation clone with its subclone with two NOTCH1 mutations with red borders, and the NOTCH1 single mutation clone with its subclone that acquired LOH with light brown borders. D) Scaled differentiation score with Standard Error of Mean (SEM) of cells classified as wild-type (WT), cells with one NOTCH1 mutation only (NOTCH1 SNV), cells with NOTCH1 LOH, cells with both NOTCH1 SNV mutation and LOH, and cells with two NOTCH1 SNV mutations in ESO-6. LOH A = NOTCH1 loss of heterozygosity with A allele retained; LOH B = NOTCH1 loss of heterozygosity with B allele retained; WT = wild type.

Update of

References

    1. Landau DA, Carter SL, Stojanov P, McKenna A, Stevenson K, Lawrence MS, et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. Cell. 2013;152:714–26. - PMC - PubMed
    1. Landau DA, Tausch E, Taylor-Weiner AN, Stewart C, Reiter JG, Bahlo J, et al. Mutations driving CLL and their evolution in progression and relapse. Nature. Nature Publishing Group; 2015;526:525–30. - PMC - PubMed
    1. Martincorena I, Fowler JC, Wabik A, Lawson ARJ, Abascal F, Hall MWJ, et al. Somatic mutant clones colonize the human esophagus with age. Science. 2018;362:911–7. - PMC - PubMed
    1. Lee-Six H, Øbro NF, Shepherd MS, Grossmann S, Dawson K, Belmonte M, et al. Population dynamics of normal human blood inferred from somatic mutations. Nature. Nature Publishing Group; 2018;561:473–8. - PMC - PubMed
    1. Olafsson S, McIntyre RE, Coorens T, Butler T, Jung H, Robinson PS, et al. Somatic Evolution in Non-neoplastic IBD-Affected Colon. Cell. 2020;182:672–684.e11. - PMC - PubMed