Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan 29;9(1):e86511.
doi: 10.1371/journal.pone.0086511. eCollection 2014.

Accurate data processing improves the reliability of Affymetrix gene expression profiles from FFPE samples

Affiliations

Accurate data processing improves the reliability of Affymetrix gene expression profiles from FFPE samples

Maurizio Callari et al. PLoS One. .

Erratum in

  • PLoS One. 2014;9(4):e95814

Abstract

Formalin fixed paraffin-embedded (FFPE) tumor specimens are the conventionally archived material in clinical practice, representing an invaluable tissue source for biomarkers development, validation and routine implementation. For many prospective clinical trials, this material has been collected allowing for a prospective-retrospective study design which represents a successful strategy to define clinical utility for candidate markers. Gene expression data can be obtained even from FFPE specimens with the broadly used Affymetrix HG-U133 Plus 2.0 microarray platform. Nevertheless, important major discrepancies remain in expression data obtained from FFPE compared to fresh-frozen samples, prompting the need for appropriate data processing which could help to obtain more consistent results in downstream analyses. In a publicly available dataset of matched frozen and FFPE expression data, the performances of different normalization methods and specifically designed Chip Description Files (CDFs) were compared. The use of an alternative CDFs together with fRMA normalization significantly improved frozen-FFPE sample correlations, frozen-FFPE probeset correlations and agreement of differential analysis between different tumor subtypes. The relevance of our optimized data processing was assessed and validated using two independent datasets. In this study we demonstrated that an appropriate data processing can significantly improve the reliability of gene expression data derived from FFPE tissues using the standard Affymetrix platform. Tools for the implementation of our data processing algorithm are made publicly available at http://www.biocut.unito.it/cdf-ffpe/.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Correlation between frozen- (FFN) and FFPE-derived fold changes as a function of the processing procedure.
Fold changes between ABC and GCB subgroups were computed in the Williams dataset for three representative processing pipelines, separately for frozen- and FFPE-derived data. Commonly DE probesets are in dark yellow, probesets only DE in frozen data are in blue and those only DE in FFPE data are in dark red.
Figure 2
Figure 2. Sensitivity of FFPE data after applying different processing pipelines in two breast cancer datasets.
(A) Flow chart of the analysis. (B) Distribution of p-values in the INT FFPE dataset for ER-related probesets identified in the GSE5460 frozen dataset. The analysis was performed on data processed using MAS5 and the standard CDF (left), fRMA and the standard CDF (center) or fRMA and the RefSeq_all CDF (right).
Figure 3
Figure 3. Evaluation of the positive predictive value of FFPE data after applying different processing pipelines in two breast cancer datasets.
(A) Flow chart of the analysis. (B) Distribution of p-values in the GSE5460 frozen dataset for probesets DE in the INT FFPE dataset between ER+ and ER− tumors. The analysis was performed on data processed using MAS5 and the standard CDF (left), fRMA and the standard CDF (center) or fRMA and the RefSeq_all CDF (right).
Figure 4
Figure 4. Immune gene set enrichment analysis results from the comparison of samples with and without lymphocitic infiltration in the INT FFPE dataset.
(A) Number of genes composing each gene set that was found in the standard CDF compared with the number of genes found in the RefSeq_all CDF. (B) Heatmap representing positive enrichment significance for the immune gene sets after processing the data with three different pipelines.

References

    1. Reis-Filho JS, Pusztai L (2011) Gene expression profiling in breast cancer: classification, prognostication, and prediction. Lancet 378: 1812–23. - PubMed
    1. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, et al. (2000) Molecular portraits of human breast tumours. Nature 406: 747–52. - PubMed
    1. van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530–6. - PubMed
    1. Simon R (2005) Roadmap for developing and validating therapeutically relevant genomic classifiers. J Clin Oncol 23: 7332–41. - PubMed
    1. Simon RM, Paik S, Hayes DF (2009) Use of archived specimens in evaluation of prognostic and predictive biomarkers. J Natl Cancer Inst 101: 1446–52. - PMC - PubMed

Publication types