Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Oct 13:2025.10.11.681805.
doi: 10.1101/2025.10.11.681805.

Large-scale single-cell phylogenetic mapping of clonal evolution in the human aging esophagus

Affiliations

Large-scale single-cell phylogenetic mapping of clonal evolution in the human aging esophagus

Tamara Prieto et al. bioRxiv. .

Abstract

The human somatic genome evolves throughout our lifespan, producing mosaic individuals comprising clones harboring different mutations across tissues. While clonal expansions in the hematopoietic system have been extensively characterized and reported to be nearly ubiquitous, clonal mosaicism (CM) has more recently also been described across multiple solid tissues. However, outstanding questions remain about the parameters and processes of human somatic evolution in non-cancerous solid human tissues, including when clones arise, how they evolve over time, and what mechanisms lead to their expansion. Questions of timing and clonal dynamics can be addressed through phylogenetic reconstruction, which serves as a 'temporal microscope', while uncovering the mechanisms of expansion necessitates simultaneous phenotypic profiling. To address this gap, here we develop Single-cell Miniaturized Automated Reverse Transcription and Primary Template-directed Amplification (SMART-PTA) for joint single-cell whole-genome and whole-transcriptome sequencing for large scale and cost efficient interrogation of solid tissue CM. We established a workflow that generates hundreds of matched single-cell whole genome and transcriptome libraries within a week. We profiled phenotypically normal esophagus tissue from four aged donors and used somatic variants to build high-resolution single-cell lineages from >2,700 cells with accompanying transcriptomic information, reconstructing >70 years of somatic evolution. T cell expansions identified from T cell receptor (TCR) sequences validated the clonal structure of the single-nucleotide variant (SNV)-based phylogenies and phylogenetic cross-correlation analysis showed that epithelial cells had higher degrees of shared ancestry by spatial location compared to immune cells. Mapping mutation signatures to the phylogenetic tree revealed the emergence of tobacco/alcohol exposure-related signatures later in life, consistent with the donors' exposure histories. We identified variants in driver genes that were previously reported in the phenotypically normal esophagus, detecting clonal expansions harboring mutations in genes including TP53 and FAT1. We mapped the evolution of clones with both monoallelic as well as biallelic TP53 loss, including a clone associated with high expression of cell cycling genes and higher chromosome instability. Leveraging the matched transcriptome data, we uncovered cell type biases in mutant clones, with a higher protphortion of TP53 or FAT1-mutant cells in an earlier basal epithelial cell state compared to wild-type cells. We further observed copy-neutral loss of heterozygosity (CNLOH) events on chromosome 9q that spanned the NOTCH1 locus in up to ~35% of epithelial cells. Mapping CNLOH events to the phylogenetic tree revealed a striking pattern in which CNLOH was separately acquired many times, reflecting convergent evolution. Cells with CNLOH events were biased towards the earlier basal epithelial state, suggestive of a selective advantage that leads to prevalent recurrence of chr9q CNLOH. Together, we demonstrate that SMART-PTA is an efficient, scalable approach for single-cell whole-genome and whole-transcriptome profiling to build phenotypically annotated single-cell phylogenies with enough throughput and power for application to normal tissue somatic evolution. Moreover, we reconstruct the evolutionary history of the esophageal epithelium at high scale and resolution, providing a window into the dynamics and processes that shape clonal expansions in phenotypically normal tissues throughout a lifespan.

PubMed Disclaimer

Conflict of interest statement

T.P. has received conference travel support from BioSkryb. D.J.Y. has received conference travel support from Ultima Genomics and BioSkryb. J.H. declares consultancy fees from Daiichi-Sankyo (unrelated). A.P.C. is listed as an inventor on submitted patents (US patent applications 63/237,367, 63/056,249, 63/015,095, 16/500,929 and 320376) has received consulting fees from Eurofins Viracor and has received conference travel support from Ultima Genomics, all outside of this work. D.A.L. is on the Scientific Advisory Board of Mission Bio, Veracyte, Ultima and BioSkyrb and has received prior research funding from 10x Genomics, Ultima Genomics, Oxford Nanopore Technologies and Illumina. All other authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Scalable joint single-cell capture of whole genomes and full-length transcriptomes.
A, In vitro evolution experiment using DLD-1 cells that were cultured and split over four months. We isolated 16 single cells (indicated with bold circles and labeled) across timepoints and performed single-cell whole-genome sequencing using mostly natural sequencing-by-synthesis chemistry from Ultima Genomics (>15× coverage depth). B, Phylogenetic reconstruction based on single-cell whole-genome amplification with UG reflects the topology of the in vitro experiment in (A). Variant allele frequency (VAF) of exonic somatic variants is indicated, displayed as the proportion of reads carrying the mutated allele per cell. Bootstrap support is annotated in the internal nodes of the tree. C, Schematic for single-cell miniaturized automated reverse transcription and Primary Template-directed Amplification (SMART-PTA). First, cell populations of interest are isolated with flow cytometry using cell surface markers to provide level 1 phenotyping. Single cells are sorted into 384-well plates, enabling miniaturized reaction volumes for increased throughput and scalability. Reagents are dispensed in picoliter volumes using robotics to allow for consistency and homogeneous reactions. Library preparation uses automated plate washing to scale production capacity. Ultima Genomics provides cost-efficient single-cell whole-genome sequencing, and Illumina single-cell RNA-seq provides level 2 phenotyping to assign cells states. Finally, phylogenetic reconstruction and phenotypic characterization analysis recapitulates the somatic evolution history of the analyzed cells. D, Quality metrics for whole-genome amplification libraries for standard-PTA versus SMART-PTA calculated for 50 cells processed with each protocol, comparing sequencing depth, genomic coverage (percentage), genomic coverage uniformity [1/median absolute deviation (MAD)] and percentages of mapped reads. P values from two-sided Wilcoxon test. Box plots represent the median, bottom and upper quartiles; whiskers correspond to 1.5 times the interquartile range. E, Quality metrics for transcriptome libraries for standard versus SMART-PTA calculated for 50 cells processed with each protocol, comparing detected transcripts in cells with >1 million reads, percentage of mitochondrial reads and percentage of ribosomal RNA reads. P values from two-sided Wilcoxon test. Box plots represent the median, bottom and upper quartiles; whiskers correspond to 1.5 times the interquartile range. F, Comparison of cost (for sequencing, reagents and consumables) and time expenditures (labor) required to profile 1,000 cells using standard-PTA+RNA versus SMART-PTA. Cost estimates reflect pre-2025 prices.
Figure 2:
Figure 2:. Large-scale single-cell genome and transcriptome sequencing of normal human esophagus.
A, Schematic of the sampling strategy used to obtain tissue biopsy of the esophagus across four regions for three donors (eso02-eso04) processed and sequenced with SMART-PTA. B, Cohort description with donor, sequencing and risk exposure information (n = 2,783 cells). Detailed donor and sampling information can be found in Supplementary Table 1. C, Illustration of the cellular makeup of the esophagus, including epithelial and immune cell types, along with illustration of the stratification of the epithelial cells in esophageal layers. D, Uniform manifold approximation projection (UMAP) of 2,491 single cells with matched DNA and RNA libraries (~10% of the total 2,783 cells were filtered out due to low RNA coverage; Supplementary Fig. 2A). Cells are colored by annotated cell type, including immune cells and epithelial cells, recapitulating the cells constituting the esophagus as illustrated in panel (C). E, Expression of marker genes underlying cell type assignment across single cells for immune cells (left) and epithelial cells (right). Percentage of mitochondrial (MT) reads, percentage of ribosomal (ribo) RNA reads and matched DNA sequencing reads are indicated. CD4 = CD4+ T cells, CD8 = CD8+ T cells. F, Mutation burden (estimated number of SNVs) derived from high-quality single-cell whole-genome sequencing of esophagus samples (this analysis summarizes 2,395/2,783 cells, excluding cells with potential contamination evidenced by >25% of cell-unique SNVs overlapping with dbSNP and those with low genome breadth and long external branch lengths; Methods) from four donors (eso01, n = 52 epithelial cells, n = 3 immune cells; eso02, n = 560 epithelial cells, n = 48 immune cells; eso03, n = 666 epithelial cells, n = 92 immune cells; eso04, n = 886 epithelial cells, n = 88 immune cells). Two-sided Wilcoxon test. To address variation in genome coverage, single-cell amplification and sequencing allelic imbalance was calculated using heterozygous germline SNPs and leveraged to correct the number of unique single-cell SNVs, correcting mutation burden estimates accordingly (Methods).
Figure 3:
Figure 3:. High-resolution single-cell phylogeny from esophagus tissue captures and quantifies human somatic evolution.
A, Single-cell phylogeny of n = 758 total cells built from SNVs obtained from SMART-PTA application to esophagus samples from donor eso03. Each leaf of the tree represents a single cell, annotated as epithelial (EPCAM+) or immune (CD45+) cell based on flow cytometry data, as well as according to cell type determined from the RNA profiles. Regions corresponding to the sampling scheme in Fig. 2A are indicated and annotated for each cell. Branch lengths represent time in years. Bootstrapping values are indicated by green circles. Phylogenies for donors eso01, eso02 and eso04 are in Supplementary Fig. 5A–C. B, Phylogenetic correlation Z score heatmap of CD45+ (immune), EPCAM+ (epithelial) or negative control (percentage of mapped reads) cells for donor eso03. Positive correlation indicates that the cell type phenotype is shared by highly related cells. Negative correlation indicates that the cell type phenotype is absent from highly related cells. No correlation indicates that the distribution of a cell phenotype across highly related cells is random. *** One-sided P < 0.05. ns, not significant. Phylogenetic correlations were calculated using node distance to avoid circularity, as patristic distances were estimated using epithelial-immune divergence assumptions. Raw phylogenetic correlations are transformed into Z scores representing permutation tests using the PATH framework. C, Mean phylogenetic correlation of esophageal regions for the EPCAM+ cells and CD45+ cells for donor eso03, indicating the degree to which phylogenetically related cells (immune versus epithelial cells) are dispersed in different regions of the esophagus. Phylogenetic correlation Z score of the distance of the biopsy from the incisors is calculated for EPCAM+ or CD45+ cells, respectively. Raw phylogenetic correlations are transformed into Z scores representing permutation tests using the PATH framework. Error bars represent the standard deviation of 5 subtree replicates (Methods). D, Generalized additive model (GAM) fit of the cumulative proportion of lineages through time (years) across non-overlapping clones of exclusively epithelial and immune cells at birth across donors eso02, eso03 and eso04. The shaded area represents the 95% confidence interval. E, Analysis of immune cell clones (with ≥ 2 cells per clone) in donor eso03. The phylogeny is annotated with immune cell type assignments and anatomical region from which the cells were sampled (top). Expression of marker genes of tissue residency and cytotoxicity (circle size indicates expression level) is shown (middle). Complementarity-Determining Region 3 (CDR3) amino acid sequences from α or β TCR chains are shown (bottom). Only TCR sequences present in at least two sister cells are shown. CD4+ and CD8+ T cell types were manually assigned based on CD4, CD8A, and CD8B gene expression levels. A CD4+ clone expressing cytotoxicity and exhaustion markers is highlighted in pink. F, Cumulative number of lineages over donor age (lineage through time, LTT) for two clades highlighted on the phylogeny in A comprising epithelial cells with distinct topology on the tree. Top, clone A, n = 52 cells, yellow. Bottom, clone B, n = 58 cells, purple.
Figure 4:
Figure 4:. Driver mutations and signatures in aging human esophagus from donors with high-risk versus low-risk exposures.
A, Single-cell phylogeny of n = 608 cells built from SNVs obtained from SMART-PTA application to esophagus samples from a 79-year-old donor eso02 with history of smoking and alcohol use (high-risk exposure). Each leaf of the tree represents a single cell, with cell type annotations displayed. Clones harboring nonsynonymous mutations in any of 83 genes (Supplementary Table 5) previously analyzed in aging esophagus tissue, are indicated in different coloring of the branches. Branches with two mapped mutations are further shaded with the gene color of the second mutation. Any subsequent mutation is further indicated on the relevant subclonal branch of the double mutant clone with the corresponding gene color. B, Single-cell phylogeny of n = 974 cells built from SNVs obtained from SMART-PTA application to esophagus samples from a 77-year-old donor eso04 with no history of smoking or alcohol use (low-risk exposure). Annotations are the same as for (A). C, Analysis of contributions of trinucleotide mutation signatures across age for eso02 (left) and eso04 (right) across lifespan determined from phylogenetic branches (Early – including branches whose end node is older than 40 years, covering a time frame from embryonic development to adulthood; Middle – including branches where the end node is less than 40 years old but terminates in an internal node; Late - terminal branches whose end node is the age of the donor; Methods; Supplementary Fig. 6E). Single-base substitution (SBS) signatures were obtained by analyzing the single-cell whole genome amplification (scWGA) data and aligning to COSMIC signatures. Trinucleotide spectrum and inset with percentage of contribution of four signatures (SBS1, SBS5, SBS16 and SBS40a) are shown for each donor across the three time points. Trinucleotide mutation signature analysis for donors eso01 and eso03 are presented in Supplementary Fig. 6C–D. D, Contribution of de novo signatures identified from whole-genome sequencing of oral epithelium samples from donors with a history of heavy smoking/heavy drinking and donors without smoking/drinking history in eso02 and eso04 at the same time periods as in (C). The A signature shares similarity with SBS1/SBS5 (clock-like) and the B signature shares similarity with SBS16 (aldehyde exposure related to tobacco use/alcohol consumption). E, Proportion of mutant cells in eso02 (high-risk exposure) and eso04 (low-risk exposure) donors with somatic variants in 28 esophageal driver genes (Supplementary Table 5) previously reported to show positive selection in esophagus tissue through dN/dS analysis,, including single-gene mutants, multi-gene mutants or wild type cells. Two-sided Fisher’s exact test. Proportions of mutant cells across all donors are presented in Supplementary Fig. 6J. F, dN/dS maximum likelihood estimation (MLE) analysis for TTN, TP53 and FAT1 genes in donors eso02, eso03 and eso04. The ratio of nonsynonymous to synonymous somatic variants was determined at the gene level (Methods). Error bars represent the 95 confidence intervals. Grey dashed line represents dN/dS of 1, indicating no selection. Values >1 indicate positive selection. *** P value < 0.001; ns, not significant, dNdScv Likelihood-Ratio Test per donor. P values from left to right: TTN- 9.6x10−2, 3.3x10−1, 6.8x10−1; TP53-2.2x10−9, 1.1x10−9, 1.2x10−4; FAT1- 2.3x10−3, 1.01x10−3, 1.9x10-1. We randomly downsampled 608 cells from each donor to avoid dNdS biases due to clonal frequency differences. The low number of mutations in eso01 (55 cells) prevented calculation of dNdS in this donor. G, Mean fraction of cells constituting each epithelial cell type (basal, proliferative basal and suprabasal) for cells without any detected mutation in TP53 (grey) and TP53-mutant cells (purple) across donors. Error bars represent 95% Clopper-Pearson confidence intervals and P values were calculated using a one-sided χ2 test. H, Same as (G) for FAT1 wild-type (grey) and FAT1-mutant (green) cells.
Figure 5:
Figure 5:. Copy-number variations and aneuploidy events in the aging human esophagus.
A, Genome-wide copy-number profiles for epithelial and immune cells across all donors. Copy-number state is displayed for all chromosomes along ~500 kb non-overlapping windows. B, Median X chromosome copy number in proportions of epithelial and immune cells for each of the four donors. C, Top, mean phased alternative allele frequency for the indicated chromosome regions for four clones identified with CNV events across 3 donors, with matching copy-number state (bottom). D, Example zoomed-in trees of clones bearing both driver SNVs as well as CNV events. Dashed lines added at 10-year intervals. Driver variants and CNV events are labeled, and cell types are indicated, for two clones in donor eso02 and one clone each for donors eso03 and eso04. Shaded colors correspond to the clone labels (clone A-clone D) in (C). We note that in the initial CNV analysis (panel A), we only included 1,587/2,783 cells with MAD score lower than 0.20, and detected 18 epithelial cells with CNVs supported by at least two sister cells. Remapping these cells onto the tree allowed for more careful assessment of CNVs in neighboring cells with lower stringency on the coverage quality. This analysis revealed 25 cells that we could annotate on the phylogeny.
Figure 6:
Figure 6:. TP53-mutant clonal landscape of 79-year-old esophagus includes cells with monoallelic and biallelic inactivation of TP53.
A, Tree displaying TP53-mutant clones from the donor eso02 phylogeny shown in Fig. 4a annotated with TP53 SNVs (left) and chromosome 17p copy-number profile (right). Mean phased alternate allele frequency at the TP53 locus, median copy number state at chromosome 17p and 3q, cell type assignments and esophagus region (biopsy distance from mouth; region B: 25 cm; region C: 31 cm) are displayed for each cell. The TP53 locus and CNLOH breakpoints are indicated. Chromosome 17 bands are displayed with the centromere indicated in red. B, Combined Annotation Dependent Depletion (CADD) PHRED scores for the indicated TP53 mutations identified in donor eso02. CADD score ≥ 20 indicates that the SNV is predicted to be in the top 1% of most deleterious out of all possible reference genome SNVs. Colors correspond to the variants on the phylogeny in (A). C, Estimated age of mutation event for the indicated eso02 TP53-mutant clones at the time of sampling. The latency from TP53 point mutation to copy number event is indicated in red for the TP53.pP151A mutant clone, which acquired a later chromosome 17p CNLOH event, and in pink for the TP53.pM246I mutant clone, which acquired a later chromosome 3q amplification. Mutation age (in years) is estimated from the age of the ancestral node from which all descendant cells carry the mutation. Error bars represent the 95% confidence intervals. MRCA, most recent common ancestor. D, Phylogenetic correlation Z score of TP53-mutant cells with S phase score and G2M score (cell cycle score). *** = P value < 0.001; * = P value < 0.05. One-sided standard normal distribution. E, Copy-number variation heatmap of chromosome 3 (top) and chromosome 9 (bottom) for TP53-mutant cells from donor eso02. Six CNV events, one identified in a TP53.pM246I mutant subclone and five identified in the TP53p.Q136X mutant clone, are labeled A-F. Locations of the PIK3CA, TP63, CKS2 and NOTCH1 genes are indicated. Centromeres are indicated in red on each chromosome.
Figure 7:
Figure 7:. Large, independent copy-neutral loss-of-heterozygosity events affecting chromosome 9q occur frequently in the aging esophagus.
A, CNLOH map of chromosome 9 for all immune (CD45+) and epithelial (EPCAM+) cells from donor eso02. Immune and epithelial cells are phylogenetically sorted within each group. The phylogeny was built after excluding SNVs detected in chromosome 9 to minimize the impact of residual germline variation (Methods). NOTCH1 is located towards the telomeric end of the q arm of chromosome 9 (centromere in red on the chromosome, NOTCH1 and PTCH1 locations are indicated in green to the right of the chromosome). Single-cell haplotypes are assigned as AB (both parental alleles) or as having CNLOH, with AA or BB haplotypes. Median copy number across ~50-kb non-overlapping windows is displayed in the bottom bar. Cells with LOH show median CNV for the affected region while wild-type cells show median for the whole chromosome arm. Median CNV values for cells with a coverage MAD higher than 0.2 (Supplementary Fig. 8D) are masked (grey). The asterisk indicates the TP53.pQ136X clone that had a deletion (non copy-neutral) event at the NOTCH1 locus (as shown in Fig. 6E). Plots for the three other donors are in Supplementary Fig. 9A, 9B, 9C. B, Single-cell phylogeny of n = 608 cells from donor (eso02), with age estimated from branch lengths labeled by decade and indicated with dashed grey lines. Each leaf of the tree represents a single cell. Allelic status at the NOTCH1 locus on chromosome 9q is annotated on the phylogeny as wild-type (AB) or having CNLOH, showing the retained parental haplotype (assigned as A or B). Numbers on the phylogeny indicate bootstrap values. Phylogenies for the other three donors are in Supplementary Fig. 9D, 9E, 9F. C, Schematic of somatic CNLOH at chromosome 9q, illustrating how after DNA damage, the homologous undamaged chromosome is used as a template for repair. A and B refer to parental haplotype at the terminal end of chromosome 9q covering the NOTCH1 locus. D, Percentage of epithelial cells carrying a chr9q CNLOH event across four donors. Error bars indicate standard deviation. Donors are classified according to exposure risk: high, medium or low. E, Estimated timings of CNLOH events, ordered from most recent to oldest in donor’s life, with impacted parental allele, A or B, indicated. CNLOH age is estimated as the age of the MRCA of the cells carrying the CNLOH. F, Sankey plot showing the number of cells according to nonsynonymous (ns) SNVs filtered for CADD score (≥ 20) in driver genes and chr9q CNLOH status. Wild-type cells have no detected nsSNVs, single mutant nsSNV cells have mutations in one driver gene, and multiple nsSNV cells have mutations in more than one driver gene. AB, no chr9q CNLOH; AA or BB, chr9q CNLOH. G, Mean fraction of cells without chr9q CNLOH (AB) or with chr9q CNLOH (AA or BB) across three epithelial cell types (basal, proliferative basal and suprabasal). Error bars represent 95% Clopper-Pearson confidence intervals, and P values were calculated using a χ2 test. H, Differentiation scores of single cells with chr9q LOH + NOTCH1 mutation (n = 334 cells), double NOTCH1 point mutation (n = 39 cells) or wild-type cells (n = 1,258) obtained from aged esophagus samples and sequenced with targeted single-cell DNA-seq and matched RNA panel. Error bars represent standard error of the mean (SEM), two-sided Student’s t-test.

References

    1. Forsberg L. A., Gisselsson D. & Dumanski J. P. Mosaicism in health and disease - clones picking up speed. Nat. Rev. Genet. 18, 128–142 (2017). - PubMed
    1. Vijg J. & Dong X. Pathogenic Mechanisms of Somatic Mutation and Genome Mosaicism in Aging. Cell 182, 12–23 (2020). - PMC - PubMed
    1. Martincorena I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018). - PMC - PubMed
    1. Yokoyama A. et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019). - PubMed
    1. Abascal F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021). - PubMed

Publication types