Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2013 Aug 20;8(8):e71462.
doi: 10.1371/journal.pone.0071462. eCollection 2013.

Large scale comparison of gene expression levels by microarrays and RNAseq using TCGA data

Affiliations
Comparative Study

Large scale comparison of gene expression levels by microarrays and RNAseq using TCGA data

Yan Guo et al. PLoS One. .

Abstract

RNAseq and microarray methods are frequently used to measure gene expression level. While similar in purpose, there are fundamental differences between the two technologies. Here, we present the largest comparative study between microarray and RNAseq methods to date using The Cancer Genome Atlas (TCGA) data. We found high correlations between expression data obtained from the Affymetrix one-channel microarray and RNAseq (Spearman correlations coefficients of ∼0.8). We also observed that the low abundance genes had poorer correlations between microarray and RNAseq data than high abundance genes. As expected, due to measurement and normalization differences, Agilent two-channel microarray and RNAseq data were poorly correlated (Spearman correlations coefficients of only ∼0.2). By examining the differentially expressed genes between tumor and normal samples we observed reasonable concordance in directionality between Agilent two-channel microarray and RNAseq data, although a small group of genes were found to have expression changes reported in opposite directions using these two technologies. Overall, RNAseq produces comparable results to microarray technologies in term of expression profiling. The RNAseq normalization methods RPKM and RSEM produce similar results on the gene level and reasonably concordant results on the exon level. Longer exons tended to have better concordance between the two normalization methods than shorter exons.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Expression value distributions of different quantification methods for the same 258 samples.
For each method, each gene's expression value was represented by the median value from the 258 samples. a) Affymetrix microarray analysis followed by RMA normalization method. b) Agilent microarray analysis followed by RMA normalization method. c) RNAseq analysis followed by the RPKM normalization method, the last bar represents genes with RPKM over 100. d) RNAseq analysis followed by the RSEM normalization method, the last bar represents genes with RSEM over 3000.
Figure 2
Figure 2. Spearman correlation coefficient analysis between different quantification methods.
For each comparison, the samples from the tumor dataset that were analyzed by the corresponding methods were extracted. For each sample, the Spearman correlation coefficient of the expression values from those methods was calculated. a) The comparison between the RPKM method and the RSEM method. The Spearman correlation coefficients were as high as around 0.94. b) The comparison between the Affymetrix method and the RPKM/RSEM method. The Spearman correlation coefficients were around 0.8. c) The comparison between the Agilent method and the RPKM/RSEM method. Since the Agilent method generated a ratio value for each gene but the RNAseq methods generated an absolute expression value for each gene, the Spearman correlation coefficients between the Agilent method and the RNAseq methods were as low as ∼0.2. d) The comparison between the Agilent method and the Affymetrix method. Since the Affymetrix method also generated an absolute expression value for each gene, the Spearman correlations were also as low as ∼0.2.
Figure 3
Figure 3. Differentially expressed gene concordance analysis using 53 paired tumor-normal breast cancer samples.
a) The Spearman correlation coefficients of tumor/normal ratios between the Agilent method, the RPKM method and the RSEM method. b) Venn diagram summarizing the overlap between genes called as significantly differentially expressed (adjusted FDR less than 0.01 and fold-change larger than 2). The differentially expressed genes in Figure 3b were computed using commonly measured genes between microarray and RNAseq. c) Scatter plot of fold-change per gene as measured by the Agilent method and the RNAseq RPKM method. Genes identified as differentially expressed with consistent fold-change direction by both methods are plotted in green. Genes identified as differentially expressed with inconsistent fold change direction by both methods are plotted in red. Genes identified as differentially expressed by either RNAseq method or Agilent method are plotted in blue and yellow, respectively. Genes not identified as differentially expressed by either method are plotted in black. Only 1.2% genes identified as differentially expressed genes by both methods were inconsistent on the fold-change direction (red data).
Figure 4
Figure 4. Fold-change consistency between the Agilent method and the RPKM method from 53 paired tumor-normal breast cancer samples.
The common genes were divided into four groups based on their RNAseq expression value, and linear regression was performed to evaluate the fold-change consistency for each group. This indicates that the fold-change derived from genes with higher RNAseq expression was more concordant with the fold-change derived from microarray expression than the fold-change derived from genes with lower RNAseq expression.
Figure 5
Figure 5. Exon expression consistency between the RPKM and RSEM normalization methods for RNAseq data.
a) Exon length distribution from RNAseq data. Exons were divided into 23 groups based on log10 value of exon length. b) The length distribution of exons, blue indicates exons were detected by RSEM but not by RPKM, red indicates exons were detected by RPKM but not by RSEM. c) The formula image of linear regression between the RPKM and RSEM values in sub-groups defined by the exon length. The group intervals equalled to the group intervals in figure 5a, except the first five and the last five groups were merged respectively due to small exon count in those groups. Only the exons detected by both RPKM and RSEM methods were used. d-f) The detailed scatter plots of exon expression consistency in three groups divided by exon length of 1∼20, 21∼50, and >50 base pairs. Only the exons detected by both RPKM and RSEM methods were used. Figures c-f indicate that the exon expression consistency increases significantly with exon length until exon length is larger than about 50 base pairs.

References

    1. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10: 57–63. - PMC - PubMed
    1. Shendure J (2008) The beginning of the end for microarrays? Nat Methods 5: 585–587. - PubMed
    1. Patterson TA, Lobenhofer EK, Fulmer-Smentek SB, Collins PJ, Chu TM, et al. (2006) Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat Biotechnol 24: 1140–1150. - PubMed
    1. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, et al. (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24: 1151–1161. - PMC - PubMed
    1. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 18: 1509–1517. - PMC - PubMed

Publication types

MeSH terms