Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun 8:5:23.
doi: 10.1186/1755-8794-5-23.

Batch correction of microarray data substantially improves the identification of genes differentially expressed in rheumatoid arthritis and osteoarthritis

Affiliations

Batch correction of microarray data substantially improves the identification of genes differentially expressed in rheumatoid arthritis and osteoarthritis

Peter Kupfer et al. BMC Med Genomics. .

Abstract

Background: Batch effects due to sample preparation or array variation (type, charge, and/or platform) may influence the results of microarray experiments and thus mask and/or confound true biological differences. Of the published approaches for batch correction, the algorithm "Combating Batch Effects When Combining Batches of Gene Expression Microarray Data" (ComBat) appears to be most suitable for small sample sizes and multiple batches.

Methods: Synovial fibroblasts (SFB; purity > 98%) were obtained from rheumatoid arthritis (RA) and osteoarthritis (OA) patients (n = 6 each) and stimulated with TNF-α or TGF-β1 for 0, 1, 2, 4, or 12 hours. Gene expression was analyzed using Affymetrix Human Genome U133 Plus 2.0 chips, an alternative chip definition file, and normalization by Robust Multi-Array Analysis (RMA). Data were batch-corrected for different acquiry dates using ComBat and the efficacy of the correction was validated using hierarchical clustering.

Results: In contrast to the hierarchical clustering dendrogram before batch correction, in which RA and OA patients clustered randomly, batch correction led to a clear separation of RA and OA. Strikingly, this applied not only to the 0 hour time point (i.e., before stimulation with TNF-α/TGF-β1), but also to all time points following stimulation except for the late 12 hour time point. Batch-corrected data then allowed the identification of differentially expressed genes discriminating between RA and OA. Batch correction only marginally modified the original data, as demonstrated by preservation of the main Gene Ontology (GO) categories of interest, and by minimally changed mean expression levels (maximal change 4.087%) or variances for all genes of interest. Eight genes from the GO category "extracellular matrix structural constituent" (5 different collagens, biglycan, and tubulointerstitial nephritis antigen-like 1) were differentially expressed between RA and OA (RA > OA), both constitutively at time point 0, and at all time points following stimulation with either TNF-α or TGF-β1.

Conclusion: Batch correction appears to be an extremely valuable tool to eliminate non-biological batch effects, and allows the identification of genes discriminating between different joint diseases. RA-SFB show an upregulated expression of extracellular matrix components, both constitutively following isolation from the synovial membrane and upon stimulation with disease-relevant cytokines or growth factors, suggesting an "imprinted" alteration of their phenotype.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Hierarchical clustering of uncorrected and batch-corrected data from time point 0 hours: a) The uncorrected data form clusters reflecting the 7 different acquiry dates (red shades for arrays generated in 2006; blue shades for those generated in 2009; for precise definition of the individual acquiry dates see Table 2). In contrast, RA and OA are not grouped. b) The ComBat-corrected data (7 batches) form clusters reflecting the diseases (RA and OA) instead of the acquiry dates.
Figure 2
Figure 2
Venn Plot for genes differentially expressed between RA and OA at time point 0 hours. BC results in a doubling of differentially expressed genes (87 → 181). A total of 57 genes (65.51%) were represented in the intersection of the two gene lists (DEG_over).
Figure 3
Figure 3
Means and variances of 181 differentially expressed genes from the DEG_wBC set (time point 0 hours) in RA (a; n = 12; i.e. 6 patients with two replicates each) and OA (b; n = 12; i.e. 6 patients with two replicates each) with (red dots) or without BC (blue dots). There are generally only marginal changes of the means, but moderate to substantial reductions of the variances, as indicated by an exclusively horizontal shift.
Figure 4
Figure 4
Means and variances of differentially expressed genes from the DEG_lost set (a,b; for definition see Figure2) and the DEG_interest set (c,d; for definition see Figure2) in RA (a, c; n = 12) and OA (b, d; n = 12) patients with (red dots) or without BC (blue dots). There are generally only marginal changes of the means, but moderate to substantial reductions of the variances.
Figure 5
Figure 5
Time courses of genes of interest (DEG_interest; see Table4b) in synovial fibroblasts from RA patients (blue and purple) or OA (red and green) stimulated with TNF-α (red and blue) or TGF-β1 (green and purple). There were only marginal differences for the gene expression values with or without BC (see also Additional file 6: Table S1). As expected, there was a clearly different regulation of the expression of 6 of 8 genes (BGN, COL1A1, COL27A1, COL5A2, COL1A2 and COL3A1; for definition of the abbreviations see Table  4) following stimulation of with either TNF-α or TGF-β1; this differential regulation was common for SFB from RA and OA patients (see Additional file 6: Table S1). However, 2 of 8 genes (COL11A1 and TINAGL) were regulated in a similar fashion by TNF-α and TGF-β1 in both RA and OA patients. Strikingly, significant differences between RA and OA patients were observed for all genes of interest already at the time point 0 hours (see Additional file 8: Table S4a). These differences were unaffected by stimulation with either TNF-α or TGF-β1.

References

    1. Luo J, Schumacher M, Scherer A, Sanoudou D, Megherbi D, Davison T, Shi T, Tong W, Shi L, Hong H, Zhao C, Elloumi F, Shi W, Thomas R, Lin S, Tillinghast G, Liu G, Zhou Y, Herman D, Li Y, Deng Y, Fang H, Bushel P, Woods M, Zhang J. A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. Pharmacogenomics J. 2010;10:278–291. doi: 10.1038/tpj.2010.57. - DOI - PMC - PubMed
    1. Chen C, Grennan K, Badner J, Zhang D, Gershon E, Jin L, Liu C. Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One. 2011;6:e17238. doi: 10.1371/journal.pone.0017238. - DOI - PMC - PubMed
    1. Scherer A. Batch Effects and Noise in Microarray Experiments: Sources and Solutions. 2009. (Wiley Series Probability Statistics).
    1. Benito M, Parker J, Du Q, Wu J, Xiang D, Perou CM, Marron JS. Adjustment of systematic microarray data biases. Bioinformatics. 2004;20:105–114. doi: 10.1093/bioinformatics/btg385. - DOI - PubMed
    1. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–127. doi: 10.1093/biostatistics/kxj037. - DOI - PubMed

Publication types

MeSH terms

Substances