Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Aug 11;51(14):7143-7162.
doi: 10.1093/nar/gkad519.

A critical spotlight on the paradigms of FFPE-DNA sequencing

Affiliations
Review

A critical spotlight on the paradigms of FFPE-DNA sequencing

Tim A Steiert et al. Nucleic Acids Res. .

Abstract

In the late 19th century, formalin fixation with paraffin-embedding (FFPE) of tissues was developed as a fixation and conservation method and is still used to this day in routine clinical and pathological practice. The implementation of state-of-the-art nucleic acid sequencing technologies has sparked much interest for using historical FFPE samples stored in biobanks as they hold promise in extracting new information from these valuable samples. However, formalin fixation chemically modifies DNA, which potentially leads to incorrect sequences or misinterpretations in downstream processing and data analysis. Many publications have concentrated on one type of DNA damage, but few have addressed the complete spectrum of FFPE-DNA damage. Here, we review mitigation strategies in (I) pre-analytical sample quality control, (II) DNA repair treatments, (III) analytical sample preparation and (IV) bioinformatic analysis of FFPE-DNA. We then provide recommendations that are tested and illustrated with DNA from 13-year-old liver specimens, one FFPE preserved and one fresh frozen, applying target-enriched sequencing. Thus, we show how DNA damage can be compensated, even when using low quantities (50 ng) of fragmented FFPE-DNA (DNA integrity number 2.0) that cannot be amplified well (Q129 bp/Q41 bp = 5%). Finally, we provide a checklist called 'ERROR-FFPE-DNA' that summarises recommendations for the minimal information in publications required for assessing fitness-for-purpose and inter-study comparison when using FFPE samples.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Summary of DNA modifications typically observed in FFPE samples. DNA instability is initiated by double strand denaturation and base unstacking, especially in AT-rich regions (far left). Modifications influencing base pairing then induce further local double strand denaturation and accelerate base modifications, leading to local hot spots of alterations. (A) Base modification caused by the nucleophilic attack of a base's amino group towards the electrophilic carbon of formaldehyde. The resulting hydroxymethyl can condensate to form an imine (altering base pairing) or further react to a dihydroxymethyl species. (B) Methylene bridges can form a covalent crosslink with another nucleophilic group of, e.g. a base or a protein, both leading to DNA polymerase blockage. (C) Base excision by hydrolysis of the N-glycosylic bond leaves a 2-deoxy-d-ribose AP site in the phosphate backbone. A transition state can form as an intermediate containing a highly reactive cyclic oxocarbenium ion that reacts with water. (D) Formaldehyde conservation also promotes the slow hydrolysis of phosphodiester bonds that breaks the phosphate backbone and fractures the DNA. (E) As glycosylase repair enzymes are inactivated by the fixation, spontaneous cytosine deamination converting cytosine to uracil is no longer corrected. In case of 5-methylcytosine this conversion results in thymine. Either way, the base will now pair with adenine instead of the original C/G base pair at that location.
Figure 2.
Figure 2.
Characterisation of differences in NGS of FFPE-DNA and FF-DNA. FF-DNA was taken from the same tissue sample as FFPE-DNA. Experimental details are described in the online methods section. (A) Proportion of each artefact type in a set of five different FFPE samples of varying qualities and preparation workflows. (B) Fold increase in artefact number in FFPE-DNA compared to FF-DNA sequences. FFPE and FF read files were appropriately down-sampled before comparison. (C) Allelic frequency of artefacts in a typical FFPE sample of low quality. (D) Sequence duplicate ratios for low-quality FFPE-DNA and matching FF-DNA samples. (E) Insert sizes for the sample pairs used in (D). (F) Systematic coverage bias typical for targeted sequencing of FFPE samples. The plot shows the rolling mean coverage over the target region of a hybridisation capture bait panel. The reads were randomly down-sampled so that the mean unique coverage over the target bases was identical in all four libraries.
Figure 3.
Figure 3.
FFPE-DNA fragment size (left) and DIN (right) correlate with NGS coverage uniformity (Fold 80 base penalty). DNA fragment size and DIN were determined on a gel electrophoresis system. Fold 80 base penalty was determined bioinformatically after sequence alignment. This correlation is based on 53 identically prepared whole exome sequencing libraries. Perfect coverage uniformity is defined by Fold 80 base penalty value of 1.
Figure 4.
Figure 4.
Principles of enzymatic FFPE-DNA repair treatments. The grey panel shows template DNA extracted from FFPE tissue containing oxidised, deaminated and mismatched bases. The original, unaltered sequence is represented as the top strand. (A) Altered base species can be excised by DNA glycosylases leaving an AP site or, in the case of bifunctional glycosylases, producing a 5′-phosphate and a 4-hydroxy-5-phospho-2-pentenal on the 3′-end. AP lyase activity of the respective enzymes excises the pentanal species, leaving a 5′-phosphate and a 3′-hydroxy end. (B) In the next repair step, these ends are processed by DNA polynucleotide kinase (PNK) that phosphorylates all 5′-ends and dephosphorylates any 3′-ends. (C) Next, DNA polymerase fills in complementary nucleotides into the double strand gaps. In this step different polymerases can be used that have a higher tolerance for altered base species or that generate blunt ends. (D) Finally, DNA ligase seals the double strand nicks. The blue frame indicates a BER-based approach, the orange frame simple glycosylase treatment, and the green frame simple polymerase treatment.
Figure 5.
Figure 5.
Effect of FFPE-DNA repair on the on-target sequence coverage and artefacts. Untreated DNA (grey) is compared to DNA treated with BER-mixes, NEBrepair (green) and IQBErepair (magenta), and FF-DNA as a negative control, in two centres (C1, C2). (A) In the coverage curves, the y-axis shows the percentage of target region with coverage of at least x reads. For FFPE-DNA, the magenta and grey curves represent the most and least uniform coverage, respectively. The FF-DNA curves are concordant. The number of replicates is shown in the inset legend with D: duplicate, Q: quadruplicate. (B) Coverage uniformity metric F80BP for FFPE-DNA and FF GIAB DNA. F80BP of FFPE-DNA is improved by repair treatments, especially by IQBErepair. The number of libraries (N) is given in the lower region of the bar chart. (C) Artefact allele frequencies of FFPE-DNA and FF GIAB control DNA. Improved coverage (cf. panels A–C) and reduced artefact occurrence (cf. panel E) lower the median AAF, generally leading to significant differences for repaired FFPE-DNA, regardless of artefact type. (D) Sequence duplication ratios. The restauration of damaged genomic fragments lowers the duplicate ratios for repaired FFPE-DNA. (E) Normalised relative artefact frequency, i.e. the number of artefacts per sequenced base in the repaired DNA, normalised by the untreated DNA. The frequency of deamination C>T/G>A artefacts is considerably reduced by DNA repair, while oxidisation C>A/G>T artefacts are only mitigated by IQBErepair.
Figure 6.
Figure 6.
Permutation analysis to identify the top library replicate strategies. Artefacts were bioinformatically filtered by their presence in library replicates of untreated and repaired FFPE-DNA. The choice of library combination in a multi-library approach can lead to a different number of remaining artefacts. Here, all possible permutations of libraries were bioinformatically tested. The top permutations for artefact removal are depicted in the graph and the tables for FFPE-DNA replicates processed in two sequencing centres. In addition, all permutations of untreated libraries are included. Untreated FFPE-DNA (U, grey), NEBrepaired FFPE-DNA (N, green), and IQBErepaired FFPE-DNA (Q, magenta) libraries were used. For this combined analysis a 1% VAF detection threshold was applied and artefacts that did not pass this VAF filter in all libraries of the doublets or triplets, respectively, were removed.
Figure 7.
Figure 7.
Analytical use of dual UMIs in the context of FFPE-DNA sequencing. In the laboratory (top part), extraction of FFPE-DNA from formalin impaired tissue results in a low diversity of functional molecules. In general, true variants (green diamond) occur in both strands whereas FFPE modifications (red asterisk) are theoretically restricted to one strand. During library preparation, adapters containing UMI sequences are ligated to both strands. The product is amplified by PCR. During PCR and sequencing, additional errors occur (yellow triangles). Only a fraction of the library's diversity is analysed during sequencing. Overrepresentation of molecules that are preferentially amplified affect the read diversity. Bioinformatic processing (bottom part) of raw reads can group reads belonging to a read family to build the molecular consensus (MolCon) using a statistical model with error removal. The duplex consensus (DupCon) combines both molecular consensuses of the Watson and Crick strands. DupCon allows single-stranded FFPE modifications to be detected and removed, as the molecular consensuses of the single strands (MolCon) are contradictory. However, true variants get suppressed (red raw read group) if the complementary molecule is not sequenced. In the right column, deduplication using UMI (UMI dedup) randomly picked a read from each family. Compared to UMI dedup, consensus approaches reduce errors, although they also result in lower coverage.
Figure 8.
Figure 8.
Probabilistic bioinformatic filters consistently reduce artefacts. This figure shows the number of false-positive variant calls (y-axis) in four untreated FFPE-DNA replicates processed in two different sequencing centres (C1, C2). ‘No filter’ refers to the total number of false-positive variant calls prior to filtering. The number of false positives was reduced using the probabilistic filter FMC (FilterMutectCalls) of GATK Mutect2 variant calling alone, or in combination with VAF-based filtering (VAF threshold 5% or 10%). All variant calls in this figure are false positives resulting from FFPE-DNA damage or other causes (e.g. sequencing error). Over 100 false positives remain even after combined FMC and 10% VAF-filtering.
Figure 9.
Figure 9.
Effect of four bioinformatic read filtering methods on library sequences with dual UMIs. Data are shown for the 13-year-old FFPE and FF sample pair, with library preparation in replicates for each input amount of 50 and 200 ng (light and dark colours, respectively) of FFPE-DNA (blue) and FF-DNA (orange). The eight libraries were target-enriched and deep sequenced. The sequence data were bioinformatically down-sampled to the identical number of 1.3E8 raw sequencing reads per library and aligned to the human reference genome, referred to as on-target (OT) reads and off-target reads. Four different bioinformatic filtering approaches are shown: standard deduplication by the read start-stop positions (dedup), deduplication by additionally using the UMI information (UMI), molecular consensus (MolCon) error correction by collapsing single read families, and duplex consensus (DupCon) where error correction was done by collapsing combined read families. (A) Number of reads per experiment. Note the y-axis scale break. (B) Percentual loss of reads following the recommended data processing compared to the raw OT data. (C) Differences in median insert size for the FF and FFPE libraries. (D) Artefact allele frequencies for the different approaches used. (E) Number of artefacts observed per 10 000 bases in the final alignment file.

Similar articles

Cited by

References

    1. Blum F. Notiz über die Anwendung des Formaldehyds (Formol) als Härtungs-und Konservierungsmittel. Anat. Anz. 1894; 9:229–231.
    1. Seiler C., Sharpe A., Barrett J.C., Harrington E.A., Jones E.V., Marshall G.B.. Nucleic acid extraction from formalin-fixed paraffin-embedded cancer cell line samples: a trade off between quantity and quality. BMC Clin. Pathol. 2016; 16:17. - PMC - PubMed
    1. Lewis F., Maughan N., Smith V., Hillan K., Quirke P.. Unlocking the archive–gene expression in paraffin-embedded tissue. J. Pathol. 2001; 195:66–71. - PubMed
    1. Arreaza G., Qiu P., Pang L., Albright A., Hong L.Z., Marton M.J., Levitan D. Pre-analytical considerations for successful next-generation sequencing (NGS): challenges and opportunities for formalin-fixed and paraffin-embedded tumor tissue (FFPE) samples. Int. J. Mol. Sci. 2016; 17:1579. - PMC - PubMed
    1. Ferlay J., Colombet M., Soerjomataram I., Mathers C., Parkin D.M., Pineros M., Znaor A., Bray F.. Estimating the global cancer incidence and mortality in 2018: GLOBOCAN sources and methods. Int. J. Cancer. 2019; 144:1941–1953. - PubMed

Publication types