Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 11;15(1):18397.
doi: 10.1038/s41598-025-01690-z.

Monitoring the rate and variability of somatic genomic alterations using long-read sequencing

Affiliations

Monitoring the rate and variability of somatic genomic alterations using long-read sequencing

Xingyao Chen et al. Sci Rep. .

Abstract

Cancer initiation occurs when a cell acquires and accumulates mutations in genes involved in the regulation of cell processes: each cell division throughout a person's life introduces novel mutations in the cells' DNA and under normal circumstances, the body is primed to prevent those from leading to cancer. Occasionally, a subset of those mutations escapes those safeguards and might eventually result in the emergence of the disease. To understand the dynamics of accumulation of somatic mutations, we have performed longitudinal whole genome sequencing of DNA obtained from whole blood from healthy individuals and cancer patients using Oxford Nanopore Technologies' Long Read Sequencing. Here we show that the number of somatic single nucleotide variants detected increases with their age and that for specific mutational processes, changes can be detected within months. We computed aggregated metrics for unique participants at each timepoint across types of variants (based on single based substitution molecular signatures) and identified patterns of change both over an individual's lifespan (age) and over the sampling period (months). This study showcases the suitability of long read sequencing of blood DNA for detecting coarse-grained differences over time and enable future development of "state of the system" personalized prevention programs.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: All authors except Hag.L., D.A. and Han.L. are employees of the Ellison Institute, LLC. D.B.A. is a scientific advisor with equity interests in Oxford Nanopore Technologies. D.B.A. and N.M. are inventors on a provisional patent (US Provisional Patent Application no.: 63/581,553) related to this work. Oxford Nanopore Technologies contributed equipment, materials, reagents and technical support. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Exogenous DNA from a well characterized breast cancer cell line (BT-474) spiked into a known genomic background at increasing concentrations can be recovered by our approach (A), preserving the mutational signatures present in the original cell line (B, horizontal lines representing the signature proportion in). The cell-line specific signatures also correlate linearly with the amount of exogenous call line DNA (C, D). Ubiquitous mutational signatures that are non-specific of the breast cancer cell line do not show a linear correlation with the amount of exogenous DNA (E, F).
Fig. 2
Fig. 2
Density plots of SNV allele frequencies before filtration (top) and after aggressive filtration (bottom) to enrich for somatic variants. The large majority of all SNVs (99.95%) are removed by the filtering step. The unfiltered SNVs show a typical distribution of frequencies associated with germline variants (broad peak around 50% frequency associated with heterozygous variants and a narrow peak around 100% comprising homozygous variants and sequencing errors), whereas the filtered SNVs are enriched for low-intermediate frequencies (10–40%).
Fig. 3
Fig. 3
Heatmap of TensorSignature associated with sequencing errors (HG002 false positives, in red) versus study subjects after filtration (in salmon) shows distinct signatures between the two groups, indicating that the somatic SNVs observed in the subjects are not caused by sequencing errors and artifacts. The color scale corresponds to the proportion of that signature present across all extracted signatures in that particular sample.
Fig. 4
Fig. 4
Median sSNV allele frequency across different functional parts of the genome and mutational effects shows the highest frequency for 5′ untranslated region (UTR) followed by silent sSNVs and by sSNVs found in intergenic regions, which is compatible with the overall lack of functional constraints. Conversely, sSNVs found in the mRNA coding region of the genome (defined here as the aggregate of 5′ and 3′ UTR and protein coding sequences) show the lowest median allele frequencies, compatible with the strongest functional constraints. UTRs, silent mutation and missense mutations show the broadest spread of frequencies due to the smaller overall number of sSNVs falling in those classes. This is especially prevalent for 5’ UTR due to the average short length of those regions in humans. Each point represents the class median for the sample. Note that UTRs, silent and missense variants are also included in the calculations for the mRNA Encoding class. Center line: median; box limits: 1st and 3rd quartiles; whiskers 1.5× interquartile range; outliers: filled black points.
Fig. 5
Fig. 5
Distribution of COSMIC SBS signatures identified in our subject cohort. Each point represents one sample. Red line indicates the average across all samples. Few signatures are near ubiquitous (SBS1, SBS5 and SBS90) whereas most signatures are found in subsets of samples.
Fig. 6
Fig. 6
Relationship between the number of observed somatic SNVs and the subject age at first draw. A significant linear correlation can be determined, corresponding to approximately 27 sSNVs accumulated per decade of life.
Fig. 7
Fig. 7
Detection of changes in specific mutational signatures over multiple draws. To identify whether any signature showed a tendency to increase over the span of our sampling (3 draws over 9–18 months) we created a linear model of the activity of each signature versus the number of months after first draw. Under a model of no accumulation, a distribution of the slopes of those models to be symmetrical and centered around zero would be expected, which was observed for the majority of all signatures (A). However, for SBS1 and SBS5 a significant bias towards positive slopes was observed, indicating an accumulation of sSNVs associated with those mutational processes over the period of the study (B, C, individual subject slopes in grey, average across all subjects in blue. SBS1: mean slope = 0.73, p value = 0.011, t = 2.62, 1-sample student t-test; SBS5: mean slope = 10.11, p value = 1.19E−4, t = 4.07, 1-sample student t-test).

References

    1. García-Nieto, P. E., Morrison, A. J. & Fraser, H. B. The somatic mutation landscape of the human body. Genome Biol.20, 298 (2019). - PMC - PubMed
    1. Helleday, T., Eshtad, S. & Nik-Zainal, S. Mechanisms underlying mutational signatures in human cancers. Nat. Rev. Genet.15, 585–598 (2014). - PMC - PubMed
    1. Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell171, 1029-1041.e21 (2017). - PMC - PubMed
    1. García-Nieto, P. E. et al. Carcinogen susceptibility is regulated by genome architecture and predicts cancer mutagenesis. EMBO J.36, 2829–2843 (2017). - PMC - PubMed
    1. Franco, I. et al. Whole genome DNA sequencing provides an atlas of somatic mutagenesis in healthy human cells and identifies a tumor-prone cell type. Genome Biol.20, 285 (2019). - PMC - PubMed

LinkOut - more resources