Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec;28(12):1901-1918.
doi: 10.1101/gr.238543.118. Epub 2018 Nov 20.

Quantification of somatic mutation flow across individual cell division events by lineage sequencing

Affiliations

Quantification of somatic mutation flow across individual cell division events by lineage sequencing

Yehuda Brody et al. Genome Res. 2018 Dec.

Abstract

Mutation data reveal the dynamic equilibrium between DNA damage and repair processes in cells and are indispensable to the understanding of age-related diseases, tumor evolution, and the acquisition of drug resistance. However, available genome-wide methods have a limited ability to resolve rare somatic variants and the relationships between these variants. Here, we present lineage sequencing, a new genome sequencing approach that enables somatic event reconstruction by providing quality somatic mutation call sets with resolution as high as the single-cell level in subject lineages. Lineage sequencing entails sampling single cells from a population and sequencing subclonal sample sets derived from these cells such that knowledge of relationships among the cells can be used to jointly call variants across the sample set. This approach integrates data from multiple sequence libraries to support each variant and precisely assigns mutations to lineage segments. We applied lineage sequencing to a human colon cancer cell line with a DNA polymerase epsilon (POLE) proofreading deficiency (HT115) and a human retinal epithelial cell line immortalized by constitutive telomerase expression (RPE1). Cells were cultured under continuous observation to link observed single-cell phenotypes with single-cell mutation data. The high sensitivity, specificity, and resolution of the data provide a unique opportunity for quantitative analysis of variation in mutation rate, spectrum, and correlations among variants. Our data show that mutations arrive with nonuniform probability across sublineages and that DNA lesion dynamics may cause strong correlations between certain mutations.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Lineage sequencing concept and implementation. Overview of the lineage sequencing concept. Numbering indicates key conceptual and implementation steps. Single cells are sampled from a clonal population and sequenced (steps 1–6; in this study, subclonal culture was used to produce enough genomic DNA for PCR-free shotgun sequence library construction). Crucially, a prior estimate of the population lineage structure (step 7; either from single-cell tracking by time-lapse imaging or from raw SNV calls) was used to identify novel somatic variants in a joint analysis of the sequence libraries (step 8). This use of the lineage information enables all the sequence libraries to provide statistical support for somatic “branch variants,” enhancing the sensitivity and specificity of somatic variant identification (for example, the schematically indicated “red variant” is supported by presence of an SNV in four sequence data sets from one sublineage and the absence of this SNV in four additional sequence data sets from the other sublineage). Where the coverage by sampled sublineages (via subclones in this study) is high, the mutations that appeared during the lineage experiment can be mapped with single-cell resolution onto the lineage (step 9; e.g., blue, red, tan segments in the dendrogram at bottom). We term mutations occurring in the last round of cell division events “leaf variants,” which by definition can be supported only by a single sequence data set (e.g., green segment in the dendrogram at bottom). Leaf variants can also be analyzed but do not benefit from the enhanced statistical power that supports branch variants and thus cannot be reliably assigned to specific cellular events.
Figure 2.
Figure 2.
HT115 and RPE1 lineage sequencing experiments by the “optical tracking → lineage → called variants” approach. (A) Scheme of the analysis pipeline for identifying branch variant SNVs by the “optical tracking → lineage → called variants” approach. “Branch variants” are SNVs that occur at the same locus in two or more (but not all) subclones and are consistent with prior lineage information. Variant counts at different stages of the informatics filtering steps used to identify high-quality lineage structure concordant branch variants are shown for the HT115 and RPE1 lineage sequencing experiments. Detail on an example HT115 GG → GT branch variant is shown. Allele counts from sequence reads at Chromosome 1 diploid locus 111370246 are shown. Four subclone sequence libraries (subclone indices 49, 34, 63, and 44, marked in red) show about half the reads indicating a variant T allele, where all the other subclones support only the reference G alleles at this locus. This G → T SNV is scored as one of 404 branch variants that appeared within the two cell cycles represented by the pink segment on the right-hand side of the dendrogram representing the HT115 lineage experiment in B. (B,C) Dendrograms representing the HT115 and RPE1 lineage experiments; red circles mark time points where cells died during the lineage development and were not available for recovery from the device. The green triangles in the bottom of the dendrogram represent cells that were recovered, subcloned, and sequenced. Dendrograms are annotated with the count of “branch variants” for resolved lineage segments (some segments are resolved to individual cell cycles). Every sequenced subclone is annotated with its index number and the count of “leaf variants” for each sequenced subclone (at bottom). “Leaf variants” are SNVs that are supported by only one subclone and likely represent variants that arose during or after the last generation of the lineage experiment. The x-axis of the dendrogram only relates to linkage of the subclones. The y-axis of the dendrogram represents the culture time course, with each cell division event observed by time-lapse imaging marked by a branch point in the dendrogram. Single cells were recovered for subculture from the HT115 lineage after 141 h, while cells were recovered from the RPE1 lineage after 168 h. (D) HT115 branch variants are clonal. Histogram of allele fraction for detected variants. Comparison between branch variants (mutations occurring during lineage formation up to the last cell division) and leaf variants (mutations occurring within or subsequent to the last cell division event in the lineage). Branch variant SNVs show a bimodal allele fraction distribution peaked at 0.5 and 1.0 as expected for the measured ploidy (copy number variation [CNV] analysis) at variant loci in this mostly diploid cell line. In contrast, subclonal mutations appear in the leaf variant group and show an allele fraction distribution peaked well below 0.5 as the variant caller attempts to balance sensitivity for low allele fraction variants with false-positive detections without the enhanced performance available for branch variants. (E) Left panel: scatter plot of variants; average read depth versus allele fraction; branch variants (blue) and leaf variants (green). The branch variant read depth is tightly correlated with the variant allele fraction in accordance with clonal mutations. The leaf variants include many subclonal variants that blend with technical noise at low variant allele fractions. Right panel: normalized histogram of read coverage depth for HT115 lineage; whole-genome (red), called branch and leaf variants (blue and green).
Figure 3.
Figure 3.
Analysis of mutation patterns in human colon carcinoma epithelial cell line HT115. HT115 shows POLE proofreading deficiency that matches previously published bulk POLE mutant colon tumor sample data. (A) Heat map showing the cosine similarity scores of comparisons of whole-genome HT115 variant SNV mutation spectra with whole-genome spectra from RPE1 samples and published data sets from POLE mutant tumor samples (Cancer Genome Atlas (TCGA), dbGAP: phs000178.v1.p1; sample annotations in Supplemental Table S1). The blue rectangle denotes the most similar tumor sample (COAD-CA-6717). (B) Comparison of detailed mutation spectra of all base substitutions observed in HT115 and RPE1 cell line branch variants and in the COAD-CA-6717 TCGA sample. HT115 and COAD-CA-6717 show highly similar spectra that differ from the RPE1 spectrum. (C) Distribution of DNA replication timing for all genomic positions and the somatic SNV branch variants and leaf variants (blue and green, respectively) from the HT115 cell line. Both the branch and leaf variant sets show the expected enrichment in late-replicating regions and depletion in early-replicating regions versus the background distribution of replication timing at all genomic loci (red). (D) Quantification of the enrichment and depletion of SNVs in the indicated categories. SNVs are enriched in the late-replicating regions while SNVs are depleted in RefSeq genic regions and further depleted in RefSeq exons (P < 0.01) in both the branch and leaf variant SNV sets.
Figure 4.
Figure 4.
Accuracy and sensitivity of lineage sequencing without microscopic tracking. (A) Scheme of the analysis pipeline for identifying branch variants by the “raw variants → lineage → called variants” approach. Variant counts at different stages of the informatics filtering steps used to identify high-quality lineage structure concordant branch variants are shown for the HT115 and RPE1 lineage sequencing experiments. The pipeline is similar to the “optical tracking → lineage → variants pipeline” (Fig. 2A) except that lineage information is incorporated later, separately from SNV coincidence, and the source of the prior lineage estimate is analysis of raw SNVs (see B and C) rather than time-lapse imaging. (B,C) Histogram of the number of high-quality coincident SNVs for each set of subclones in which such variants occurred for the HT115 (B) and the RPE1 (C) data sets. At bottom, each cluster is marked as consistent (+) or inconsistent (−), with the lineage structure indicated by the time-lapse imaging. For each cell line, the group of subclone sets with high frequencies of these SNVs are both internally self-consistent and consistent with the independent time-lapse imaging data. (D,E) Comparison between dendrograms representing lineages based on genomic distance among subclone pairs and the time-lapse imaging data (only subclones that were cultured and sequenced are represented); HT115 (D) and RPE1 (E). The dendrograms based on genomic distance and time-lapse imaging indicate the same connectivity between subclones, the information relevant to joint variant calling in lineage sequencing, but have different branch lengths and are missing several internal cell divisions. The blue dots in the time-lapse imaging dendrogram represent cell division events that are not independently available from the sequence data. The dendrograms based on time-lapse imaging have a y-axis with units of minutes.
Figure 5.
Figure 5.
Intra-lineage heterogeneity in mutation rate and multiple mutation events. (A) Measured P-values from observed mutation counts are plotted vs. calculated theoretical (Poisson) P-values for the branch variant set for each sublineage to form quantile-quantile (QQ) plots (P-values for theoretical Poisson-distributed lineage-wise mutation count data versus observed data) for both HT115 (left) and RPE1 (right). The plotted points deviate strongly from the expected distribution (which would follow an x = y relationship) at both ends of the distribution, showing that the sublineages present in each data set cannot be plausibly modeled by a Poisson distribution based on independent mutations. (B) Schematic showing persistent lesion hypothesis for correlated same-site mutation. DNA lesions (marked as G*) that are not repaired during S-phase compel the DNA polymerase to replicate opposite lesion bases with a high probability of mutation. If the lesion has not been repaired before the next S-phase in the daughter cell carrying the lesion, an additional mutation at the identical genomic locus is likely to result. If the second mutation is different from the first, this process can be readily detected from lineage sequencing data. The example scheme represents the CC > CT and CC > CA mutations we detected at the Chromosome 2 locus 128889581 (marked with purple circle). (C) Seven multiple mutation events were found in the HT115 lineage. Read counts for each example are presented and marked with a colored symbol. The lineage segment where each example occurred is shown (with corresponding colors). None of these events overlap the most probable mutation types found in the POLE signature.

References

    1. Abeshouse A, Adebamowo C, Adebamowo SN, Akbani R, Akeredolu T, Ally A, Anderson ML, Anur P, Appelbaum EL, Armenia J, et al. 2017. Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 171: 950–965.e28. 10.1016/j.cell.2017.10.014 - DOI - PMC - PubMed
    1. Albertson TM, Ogawa M, Bugni JM, Hays LE, Chen Y, Wang Y, Treuting PM, Heddle JA, Goldsby RE, Preston BD. 2009. DNA polymerase ε and δ proofreading suppress discrete mutator and cancer phenotypes in mice. Proc Natl Acad Sci 106: 17101–17104. 10.1073/pnas.0907147106 - DOI - PMC - PubMed
    1. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SAJR, Behjati S, Biankin AV, Bignell GR, Bolli N, Borg A, Børresen-Dale AL, et al. 2013a. Signatures of mutational processes in human cancer. Nature 500: 415–421. 10.1038/nature12477 - DOI - PMC - PubMed
    1. Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR. 2013b. Deciphering signatures of mutational processes operative in human cancer. Cell Rep 3: 246–259. 10.1016/j.celrep.2012.12.008 - DOI - PMC - PubMed
    1. Araten DJ, Golde DW, Zhang RH, Thaler HT, Gargiulo L, Notaro R, Luzzatto L. 2005. A quantitative measurement of the human somatic mutation rate. Cancer Res 65: 8111–8117. 10.1158/0008-5472.CAN-04-1198 - DOI - PubMed

Publication types

LinkOut - more resources