Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2013;14 Suppl 9(Suppl 9):S1.
doi: 10.1186/1471-2105-14-S9-S1. Epub 2013 Jun 28.

Parallel comparison of Illumina RNA-Seq and Affymetrix microarray platforms on transcriptomic profiles generated from 5-aza-deoxy-cytidine treated HT-29 colon cancer cells and simulated datasets

Affiliations
Comparative Study

Parallel comparison of Illumina RNA-Seq and Affymetrix microarray platforms on transcriptomic profiles generated from 5-aza-deoxy-cytidine treated HT-29 colon cancer cells and simulated datasets

Xiao Xu et al. BMC Bioinformatics. 2013.

Abstract

Background: High throughput parallel sequencing, RNA-Seq, has recently emerged as an appealing alternative to microarray in identifying differentially expressed genes (DEG) between biological groups. However, there still exists considerable discrepancy on gene expression measurements and DEG results between the two platforms. The objective of this study was to compare parallel paired-end RNA-Seq and microarray data generated on 5-azadeoxy-cytidine (5-Aza) treated HT-29 colon cancer cells with an additional simulation study.

Methods: We first performed general correlation analysis comparing gene expression profiles on both platforms. An Errors-In-Variables (EIV) regression model was subsequently applied to assess proportional and fixed biases between the two technologies. Then several existing algorithms, designed for DEG identification in RNA-Seq and microarray data, were applied to compare the cross-platform overlaps with respect to DEG lists, which were further validated using qRT-PCR assays on selected genes. Functional analyses were subsequently conducted using Ingenuity Pathway Analysis (IPA).

Results: Pearson and Spearman correlation coefficients between the RNA-Seq and microarray data each exceeded 0.80, with 66%~68% overlap of genes on both platforms. The EIV regression model indicated the existence of both fixed and proportional biases between the two platforms. The DESeq and baySeq algorithms (RNA-Seq) and the SAM and eBayes algorithms (microarray) achieved the highest cross-platform overlap rate in DEG results from both experimental and simulated datasets. DESeq method exhibited a better control on the false discovery rate than baySeq on the simulated dataset although it performed slightly inferior to baySeq in the sensitivity test. RNA-Seq and qRT-PCR, but not microarray data, confirmed the expected reversal of SPARC gene suppression after treating HT-29 cells with 5-Aza. Thirty-three IPA canonical pathways were identified by both microarray and RNA-Seq data, 152 pathways by RNA-Seq data only, and none by microarray data only.

Conclusions: These results suggest that RNA-Seq has advantages over microarray in identification of DEGs with the most consistent results generated from DESeq and SAM methods. The EIV regression model reveals both fixed and proportional biases between RNA-Seq and microarray. This may explain in part the lower cross-platform overlap in DEG lists compared to those in detectable genes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Expressional consistency between RNA-Seq and microarray data. A. Detectable genes reported by each technology based on a common filter procedure (See Methods). Venn diagrams of detectable genes are shown 3 experimental conditions (0 µM, 5 µM and 10 µM) respectively, and overlap rates are calculated by dividing number of commonly detectable genes by the union. B. By-group scatter plot depicting the expression profiles of all genes. Log2 transformed FPKM values from RNA-Seq and log2 scaled microarray gene intensities (normalized) are used in the scatterplot. We added 1 to FPKM value before log2 transformation to facilitate calculation. Commonly detected genes are shown in red color while platform exclusive genes are denoted in black. Both Pearson correlation coefficients (PCC) and Spearman correlation coefficients (SCC) were calculated based on all gene entries (except for those not having probe names on the array or RNA-Seq reference genome). C. By-group expressional density histogram for both commonly detectable genes and RNA-Seq specific ones. The x-axis denotes the RNA-Seq FPKM value (log2 scale) distribution and y-axis shows the frequency of genes within each category. Commonly detectable genes are depicted in black while RNA-Seq exclusive genes are shown in grey color.
Figure 2
Figure 2
EIV Regression Model Comparing Microarray and RNA-Seq Gene profiles. EIV regression model is constructed for independent variable (microarray normalized gene intensities in log2 unit-free scale) and dependent variable (RNA-Seq FPKM values in log2 unit-free scale) for each of the experimental groups (5 µM, 10 µM and 0 µM) of HT29 samples. Log2 scaled unit-free normalized gene intensities are shown as grey circles in the scatter plot and EIV regression line is drawn in bold black. For each of the plots, a dashed reference line of Y = X (corresponding to perfect platform agreement) is also included to indicate the deviation of the real regression line from the reference. The estimated regression equation is shown in the lower-right section of each plot. The 95% bootstrap confidence interval for the regression intercept and slope (α and ß) are shown on the top of each plot.
Figure 3
Figure 3
Sensitivity and False Discovery Rate (FDR) curve plots for simulated data using each DEG method. Sensitivity (A.) or FDR (B.) are calculated for 4 RNA-Seq DEG methods (SAMSeq, baySeq, DESeq, and NOISeq) and 3 microarray DEG algorithms (T-test, SAM, eBayes). Method curves are shown in different colors (see figure legends) at each 95% minimum fold change level for pre-determined DEGs. Each fold change on x-axis (in log2 scale) corresponds to the lower 5% fold change of normally distributed DEGs predefined in the simulation process (See Methods).

References

    1. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome research. 2008;18(9):1509–1517. doi: 10.1101/gr.079558.108. - DOI - PMC - PubMed
    1. Oshlack A, Robinson MD, Young MD. From RNA-seq reads to differential expression results. Genome biology. 2010;11(12):220. doi: 10.1186/gb-2010-11-12-220. - DOI - PMC - PubMed
    1. Ren S, Peng Z, Mao JH, Yu Y, Yin C, Gao X, Cui Z, Zhang J, Yi K, Xu W. et al. RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings. Cell research. 2012;22(5):806–821. doi: 10.1038/cr.2012.30. - DOI - PMC - PubMed
    1. Courtney E, Kornfeld S, Janitz K, Janitz M. Transcriptome profiling in neurodegenerative disease. Journal of neuroscience methods. 2010;193(2):189–202. doi: 10.1016/j.jneumeth.2010.08.018. - DOI - PubMed
    1. Farkas MH, Grant GR, Pierce EA. Transcriptome analyses to investigate the pathogenesis of RNA splicing factor retinitis pigmentosa. Advances in experimental medicine and biology. 2012;723:519–525. doi: 10.1007/978-1-4614-0631-0_65. - DOI - PMC - PubMed

Publication types