Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr 9;15(4):e0231446.
doi: 10.1371/journal.pone.0231446. eCollection 2020.

Blind estimation and correction of microarray batch effect

Affiliations

Blind estimation and correction of microarray batch effect

Sudhir Varma. PLoS One. .

Abstract

Microarray batch effect (BE) has been the primary bottleneck for large-scale integration of data from multiple experiments. Current BE correction methods either need known batch identities (ComBat) or have the potential to overcorrect, by removing true but unknown biological differences (Surrogate Variable Analysis SVA). It is well known that experimental conditions such as array or reagent batches, PCR amplification or ozone levels can affect the measured expression levels; often the direction of perturbation of the measured expression is the same in different datasets. However, there are no BE correction algorithms that attempt to estimate the individual effects of technical differences and use them to correct expression data. In this manuscript, we show that a set of signatures, each of which is a vector the length of the number of probes, calculated on a reference set of microarray samples can predict much of the batch effect in other validation sets. We present a rationale of selecting a reference set of samples designed to estimate technical differences without removing biological differences. Putting both together, we introduce the Batch Effect Signature Correction (BESC) algorithm that uses the BES calculated on the reference set to efficiently predict and remove BE. Using two independent validation sets, we show that BESC is capable of removing batch effect without removing unknown but true biological differences. Much of the variations due to batch effect is shared between different microarray datasets. That shared information can be used to predict signatures (i.e. directions of perturbation) due to batch effect in new datasets. The correction can be precomputed without using the samples to be corrected (blind), done on each sample individually (single sample) and corrects only known technical effects without removing known or unknown biological differences (conservative). Those three characteristics make it ideal for high-throughput correction of samples for a microarray data repository. We also compare the performance of BESC to three other batch correction methods: SVA, Removing Unwanted Variation (RUV) and Hidden Covariates with Prior (HCP). An R Package besc implementing the algorithm is available from http://explainbio.com.

PubMed Disclaimer

Conflict of interest statement

The author (SV) does contract statistical analysis under the business name of “HiThru Analytics LLC”. Currently he is working as a contractor, part time with the National Institutes of Health (Bethesda MD) and part time with Tridiuum Inc. (Philadelphia PA). This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Figures

Fig 1
Fig 1. Cross-validated performance on reference set.
The cross-validated Distance Ratio Score (DRS) for the reference set vs. the number of Batch Effect Signatures (BES) used for the correction. Higher DRS indicate lower levels of batch effect. The DRS reaches a maximum for 30 BES.
Fig 2
Fig 2. Performance on validation set 1.
a) DRS for the validation set 1 using BESC, SVA, RUV and HCP and the permuted null BES b) Contribution of variance due to organ-type, calculated using PVCA c) Number of genes differentially ex-pressed between male and female samples at various levels of correction.
Fig 3
Fig 3. Performance on validation set 2.
a) DRS for the validation set 2 using BESC, SVA, RUV and HCP and the permuted null BES b) Contribution of variance due to disease status, calculated using PVCA c) Number of genes differentially expressed between MSI and MSS samples at various levels of correction by BESC, SVA, RUV and HCP.

Similar articles

Cited by

References

    1. Scherer A. Batch Effects and Noise in Microarray Experiments: Sources and Solutions. http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470741384.html. Accessed 29 Nov 2016.
    1. Fare TL, Coffey EM, Dai H, He YD, Kessler DA, Kilian KA, et al. Effects of Atmospheric Ozone on Microarray Data Quality. Anal Chem. 2003;75:4672–5. 10.1021/ac034241b - DOI - PubMed
    1. Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11:733–9. 10.1038/nrg2825 - DOI - PMC - PubMed
    1. Leek JT, Storey JD. Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis. PLoS Genet. 2007;3:e161. - PMC - PubMed
    1. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–27. 10.1093/biostatistics/kxj037 - DOI - PubMed

MeSH terms