Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun 5;16(1):5236.
doi: 10.1038/s41467-025-60391-3.

Direct detection of 8-oxo-dG using nanopore sequencing

Affiliations

Direct detection of 8-oxo-dG using nanopore sequencing

Marc Pagès-Gallego et al. Nat Commun. .

Abstract

Genomic DNA is under constant oxidative damage, with 8-oxo-7,8-dihydro-2'-deoxyguanosine (8-oxo-dG) being the prominent lesion linked to mutagenesis, epigenetics, and gene regulation. Existing methods to detect 8-oxo-dG rely on indirect approaches, while nanopore sequencing enables direct detection of base modifications. A model for 8-oxo-dG detection is currently missing due to the lack of training data. Here, we develop a strategy using synthetic oligos to generate long, 8-oxo-dG context-variable DNA molecules for deep learning and nanopore sequencing. Our training approach addresses the rarity of 8-oxo-dG relative to guanine, enabling specific detection. Applied to a tissue culture model of oxidative damage, our method reveals uneven genomic 8-oxo-dG distribution, dissimilar context pattern to C>A mutations, and local 5-mC depletion. This dual measurement of 5-mC and 8-oxo-dG at single-molecule resolution uncovers new insights into their interplay. Our approach also provides a general framework for detecting other rare DNA modifications using synthetic DNA and nanopore sequencing.

PubMed Disclaimer

Conflict of interest statement

Competing interests: J.d. R. and A.M. are co-founders and directors of Cyclomics, a genomics company, they declare no competing interests. M.P-G., D.M.K.v.S., N.J.M.B., R.S., J.P.K., C.V., M.J.v.R., R.v.B., B.M.T.B., and T.B.D. declare no competing interests.

Figures

Fig. 1
Fig. 1. 8-oxo-dG has a detectable effect on the nanopore raw signal.
a Schematic of the design of the 8-oxo-dG containing oligos. b Error rate per oligo base across all sequenced 8-oxo-dG containing repeats. Random bases are excluded from the analysis as we do not know their true reference. Horizontal bar represents the median, boxplots minimum and maximum bounds represent the 25th and 75th percentiles, respectively, and whiskers extend to 1.5 times the interquartile range. Data derived from ten thousand randomly sampled oligos. c Example of 8-oxo-dG (red) and G (purple) signal in the TG(8-oxo-dG/G)CTG context. Dashed black line indicates the expected signal value based on the G containing sequence. d Average measured normalized G signal and 8-oxo-dG signal per measured 5-mer as segmented using Tombo. Identity line indicated as the dashed gray line. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Fine-tuning of a Bonito model to basecall 8-oxo-dG.
a Schematic of a two step approach to call modified bases in nanopore sequencing. First, the raw signal is basecalled to a DNA sequence. Upon basecalling the (potentially modified) base of interest, a small window of raw signal that corresponds to that particular base is cut; and together with a portion of the basecalled sequence, is given as input to a second model that predicts whether the base is modified or not. b Confusion matrix of the fine-tuned Bonito base caller. Values indicate the fraction of outcomes for each ground truth base. c, Match, mismatch, deletion and insertion rates of the pre-trained and different fine-tuned models using different datasets on the T2T human reference genome nanopore data. Horizontal bar represents the median, boxplots minimum and maximum bounds represent the 25th and 75th percentiles, respectively, and whiskers extend to 1.5 times the interquartile range. Data derived from the 32 thousand reads in the test set.
Fig. 3
Fig. 3. Performance of a Remora model.
a ROC curves on three Remora models with different positive class weights (100%, 10% and 1%), values indicate the AUC. The straight line shape of the receiver operator characteristic (ROC) curve for the 1% weight model was due to a small number of negative samples with a high 8-oxo-dG score, which reduces granularity in the ROC curve thresholds (Supplementary Fig. 10). b TPR and Q-Score specificity evaluated on the test fold for two different thresholds (0.5, and 0.95) on the three Remora models with different positive class weights (100%, 10% and 1%). c TPR and Q-Score specificity evaluated on the test fold for the experiment in which additional features were added sequentially. Metrics are calculated using a 0.95 threshold. Models include additional features in a cumulative manner, from left to right: basecalls, expected signal, difference between expected and measured signal, and basecall phred quality scores. Red bars include features from the fine-tuned model, purple bars also include features from the Bonito pre-trained model, blue bars include the difference between the features of the Bonito fine-tuned and pre-trained models. d Performance of the Remora model with expected signal as feature per 5-mer at the >0.95 score threshold. Red colored dots indicate 5-mers for which there were no false positive calls, for these 5-mers the QScore was annotated as if a single false positive call was made. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. 8-oxo-dG distribution across the genome.
a Overall 8-oxo-dG molecules per 1 million G molecules per L-Alanine and D-Alanine treated cells. Error bars indicate minimum and maximum calculated values. Dots indicate the values per sequenced condition (n = 3). b 8-oxo-dG levels across different GC (%) content bins. Blue and red lines indicate values for L-Alanine and D-Alanine treated cells respectively. Gray dashed line indicates the distribution of measured GC content bins. c 8-oxo-dG levels per chromosome, chromosome arm and DNA strand. Bars indicate average values. Dots indicate the values for all conditions (n = 6). Asterisks indicate a significant p value (<0.05) derived from a two-sided t-test between the values of the forward and reverse strands. Exact p values can be found in the source data. d Distribution of 8-oxo-dG levels per genomic region type across all conditions and chromosomes (n = 23). The dashed gray horizontal line indicates the overall 8-oxo-dG level across all conditions, irrespective of genomic region. Horizontal bar represents the median, boxplots minimum and maximum bounds represent the 25th and 75th percentiles, respectively, and whiskers extend to 1.5 times the interquartile range. Black dots indicate the underlying data. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Mutational and 8-oxo-dG signatures.
a Combined mutational signature (C>A or G>T) from all the cell lines derived from Illumina sequencing. b 8-oxo-dG normalized abundance profile for each 3-mer. Note that the 3-mers are annotated as the reverse opposite strand of 8-oxo-dG (e.g. ACT would be equivalent to AXT, where X denotes 8-oxo-dG). c Mutation enrichment of each 3-mer normalized to the abundance of each 3-mer in the human genome. d 8-oxo-dG enrichment of each 3-mer normalized to the abundance of each 3-mer in the human genome. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Genomic de-methylation surrounds 8-oxo-dG.
a Methylation levels for 8-oxo-dG (red) and Guanine (black) containing reads. Reads are centered around (position zero) the 8-oxo-dG or Guanine. Methylation levels are obtained from the same molecule. The transparent red line indicates the underlying data, the dark gray line is the result of an 11 base average convolution. Data from all experimental conditions is included, see (Supplementary Fig. 35) for the per condition analysis. b Correlation between 8-oxo-dG enrichment as in Figure 5d and the methylation difference at position zero. c Similar to (a), but data has been split based on the 3-mer (reverse complement) surrounding the guanine. Source data are provided as a Source Data file.

References

    1. Cadet, J. et al. Hydroxyl radicals and DNA base damage. Mutat. Res.Fundam. Mol. Mech. Mutagen.424, 9–21 (1999). - PubMed
    1. Cooke, M. S., Evans, M. D., Dizdaroglu, M. & Lunec, J. Oxidative DNA damage: mechanisms, mutation, and disease. FASEB17, 1195–1214 (2003). - PubMed
    1. Hush, N. S. & Cheung, A. S. Ionization potentials and donor properties of nucleic acid bases and related compounds. Chem. Phys. Lett.34, 11–13 (1975).
    1. Bergeron, F., Auvré, F., Radicella, J. P. & Ravanat, J.-L. HO• radicals induce an unexpected high proportion of tandem base lesions refractory to repair by DNA glycosylases. Proc. Natl Acad. Sci. USA107, 5528–5533 (2010). - PMC - PubMed
    1. Reisz, J. A., Bansal, N., Qian, J., Zhao, W. & Furdui, C. M. Effects of ionizing radiation on biological molecules—mechanisms of damage and emerging methods of detection. Antioxid. Redox Signal.21, 260–292 (2014). - PMC - PubMed

LinkOut - more resources