Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Randomized Controlled Trial
. 2023 Feb 3:11:e14786.
doi: 10.7717/peerj.14786. eCollection 2023.

Data processing choices can affect findings in differential methylation analyses: an investigation using data from the LIMIT RCT

Affiliations
Randomized Controlled Trial

Data processing choices can affect findings in differential methylation analyses: an investigation using data from the LIMIT RCT

Jennie Louise et al. PeerJ. .

Abstract

Objective: A wide array of methods exist for processing and analysing DNA methylation data. We aimed to perform a systematic comparison of the behaviour of these methods, using cord blood DNAm from the LIMIT RCT, in relation to detecting hypothesised effects of interest (intervention and pre-pregnancy maternal BMI) as well as effects known to be spurious, and known to be present.

Methods: DNAm data, from 645 cord blood samples analysed using Illumina 450K BeadChip arrays, were normalised using three different methods (with probe filtering undertaken pre- or post- normalisation). Batch effects were handled with a supervised algorithm, an unsupervised algorithm, or adjustment in the analysis model. Analysis was undertaken with and without adjustment for estimated cell type proportions. The effects estimated included intervention and BMI (effects of interest in the original study), infant sex and randomly assigned groups. Data processing and analysis methods were compared in relation to number and identity of differentially methylated probes, rankings of probes by p value and log-fold-change, and distributions of p values and log-fold-change estimates.

Results: There were differences corresponding to each of the processing and analysis choices. Importantly, some combinations of data processing choices resulted in a substantial number of spurious 'significant' findings. We recommend greater emphasis on replication and greater use of sensitivity analyses.

Keywords: Bioinformatics; DNA methylation; Differential methylation; Reproducibility.

PubMed Disclaimer

Conflict of interest statement

The authors declare there are no competing interests.

Figures

Figure 1
Figure 1. Flowchart of data processing and analysis.
Combinations of data-processing and analysis choices, consisting of six normalised datasets (SQN, BMIQ or SWAN, with probe filering before or afterwards), use or non-use of ComBat processing (supervised or unsupervised), and analysis with either an unadjusted model, a model adjusted for batch, or a model adjusted for batch and cell type proportion.
Figure 2
Figure 2. Probes ranked in top 10 by p-value in batch+cell adjusted model, for (A) infant sex, (B) BMI in standard care, (C) short-haired in Tabby.
For each probe the rank is given by pre- vs post-filtering, normalisation method, and batch-handling method. The model is one adjusting for batch (either explicitly in the model or via batch-correction algorithm) and cell type proportion. Adjust = adjusted for batch in the model; SCB = Supervised ComBat; UCB = Unsupervised ComBat.
Figure 3
Figure 3. Top 10 probes by LogFC: infant sex.
Largest LogFC for Infant Sex (female), by normalisation and batch correction method.
Figure 4
Figure 4. Top 10 probes by LogFC: BMI in standard care.
Largest LogFC for effect of BMI in standard care group, by normalisation and batch correction method.
Figure 5
Figure 5. Top 10 probes by LogFC: ‘short-haired’ in ‘Tabby’.
Largest LogFC for effect of ‘Short-Haired’ in ‘Tabby’ group, by normalisation and batch correction method.
Figure 6
Figure 6. Distribution of unadjusted P values by normalisation and batch correction method, for batch and cell adjusted models.
Only models from data where probe filtering was performed post-normalisation are included. The model is one adjusting for batch (either explicitly in the model or via batch-correction algorithm) and cell type proportion.
Figure 7
Figure 7. Distribution of log-fold-change estimates by normalisation and batch correction method, for batch and cell adjusted models.
Only models from data where probe filtering was performed post-normalisation are included. The model is one adjusting for batch (either explicitly in the model or via batch-correction algorithm) and cell type proportion.

References

    1. Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, Irizarry RA. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30:1363–1369. doi: 10.1093/bioinformatics/btu049. - DOI - PMC - PubMed
    1. Bakulski KM, Feinberg JI, Andrews SV, Yang J, Brown S, McKenney SL, Witter F, Walston J, Feinberg AP, Fallin MD. DNA methylation of cord blood cell types: applications for mixed cell birth studies. Epigenetics. 2016;11:354–362. doi: 10.1080/15592294.2016.1161875. - DOI - PMC - PubMed
    1. Cardenas A, Allard C, Doyon M, Houseman EA, Bakulski KM, Perron P, Bouchard L, Hivert M-F. Validation of a DNA methylation reference panel for the estimation of nucleated cells types in cord blood. Epigenetics. 2016;11:773–779. doi: 10.1080/15592294.2016.1233091. - DOI - PMC - PubMed
    1. Chen Y, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, Gallinger S, Hudson TJ, Weksberg R. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics. 2013;8:203–209. doi: 10.4161/epi.23470. - DOI - PMC - PubMed
    1. Dedeurwaerder S, Defrance M, Bizet M, Calonne E, Bontempi G, Fuks F. A comprehensive overview of Infinium HumanMethylation450 data processing. Briefings in Bioinformatics. 2014;15:929–941. doi: 10.1093/bib/bbt054. - DOI - PMC - PubMed

Publication types

LinkOut - more resources