Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec;23(12):957-968.
doi: 10.1089/cmb.2016.0042. Epub 2016 Aug 5.

Impact of Microarray Preprocessing Techniques in Unraveling Biological Pathways

Affiliations

Impact of Microarray Preprocessing Techniques in Unraveling Biological Pathways

Enrique J Deandrés-Galiana et al. J Comput Biol. 2016 Dec.

Abstract

To better understand the impact of microarray preprocessing normalization techniques on the analysis of biological pathways in the prediction of chronic fatigue (CF) following radiation therapy, this study has compared the list of predictive genes found using the Robust Multiarray Averaging (RMA) and the Affymetrix MAS5 method, with the list that is obtained working with raw data (without any preprocessing). First, we modeled the spiked-in data set where differentially expressed genes were known and spiked-in at different known concentrations, showing that the precisions established by different gene ranking methods were higher than working with raw data. The results obtained from the spiked-in experiment were extrapolated to the CF data set to run learning and blind validation. RMA and MAS5 provided different sets of discriminatory genes that have a higher predictive accuracy in the learning phase, but lower predictive accuracy during the blind validation phase, suggesting that the genetic signatures generated using both preprocessing techniques cannot be generalizable. The pathways found using the raw data set better described what is a priori known for the CF disease. Besides, RMA produced more reliable pathways than MAS5. Understanding the strengths of these two preprocessing techniques in phenotype prediction is critical for precision medicine. Particularly, this article concludes that biological pathways might be better unraveled working with raw expression data. Moreover, the interpretation of the predictive gene profiles generated by RMA and MAS5 should be done with caution. This is an important conclusion with a high translational impact that should be confirmed in other disease data sets.

Keywords: DNA arrays; cancer genomics; gene expression; gene networks.

PubMed Disclaimer

Conflict of interest statement

Enrique J. de Andrés was supported by the Ministerio de Economia y Competitividad (grant TIN2011-23558).

Figures

<b>FIG. 1.</b>
FIG. 1.
Flow chart of the methodology.
<b>FIG. 2.</b>
FIG. 2.
Empirical CDF of the positions of the differentially expressed genes ranked by the FC/FR methods for each comparison and different types of data. CDF, cumulative distribution function; FC, fold change; FR, Fisher's ratio.
<b>FIG. 3.</b>
FIG. 3.
Pearson correlation coefficient minimum spanning tree of the 50 first selected probes using raw data.
<b>FIG. 4.</b>
FIG. 4.
Pearson correlation coefficient minimum spanning tree of the 50 first selected probes using preprocessed data with RMA.
<b>FIG. 5.</b>
FIG. 5.
Pearson correlation coefficient minimum spanning tree of the 50 first selected probes using preprocessed data with MAS5.

References

    1. Affymetrix. 2001. Microarray suite user guide, version 5. www.affymetrix.com/support/technical/manuals.affx Accessed Jan. 21, 2016
    1. Affymetrix. 2015. Latin square data for expression algorithm assessment. www.affymetrix.com/support/technical/sample_data/datasets.affx Accessed Jan. 21, 2016
    1. Benito M., Parker J., Du Q., et al. . 2004. Adjustment of systematic microarray data biases. Bioinformatics. 20, 105–114 - PubMed
    1. Cella D., Eton D.T., Lai J.S., et al. . 2002. Combining anchor and distribution-based methods to derive minimal clinically important differences on the Functional Assessment of Cancer Therapy (FACT) anemia and fatigue scales. J. Pain Symptom. Manage. 24, 547–561 - PubMed
    1. Chen C., Grennan K., Badner J., et al. . 2011. Removing batch effects in analysis of expression microarray data: An evaluation of six batch adjustment methods. PLoS One. 6, e17238. - PMC - PubMed

Substances