Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Dec 4;13(1):214.
doi: 10.1186/s13148-021-01200-8.

Epigenome-wide association studies: current knowledge, strategies and recommendations

Affiliations
Review

Epigenome-wide association studies: current knowledge, strategies and recommendations

Maria Pia Campagna et al. Clin Epigenetics. .

Abstract

The aetiology and pathophysiology of complex diseases are driven by the interaction between genetic and environmental factors. The variability in risk and outcomes in these diseases are incompletely explained by genetics or environmental risk factors individually. Therefore, researchers are now exploring the epigenome, a biological interface at which genetics and the environment can interact. There is a growing body of evidence supporting the role of epigenetic mechanisms in complex disease pathophysiology. Epigenome-wide association studies (EWASes) investigate the association between a phenotype and epigenetic variants, most commonly DNA methylation. The decreasing cost of measuring epigenome-wide methylation and the increasing accessibility of bioinformatic pipelines have contributed to the rise in EWASes published in recent years. Here, we review the current literature on these EWASes and provide further recommendations and strategies for successfully conducting them. We have constrained our review to studies using methylation data as this is the most studied epigenetic mechanism; microarray-based data as whole-genome bisulphite sequencing remains prohibitively expensive for most laboratories; and blood-based studies due to the non-invasiveness of peripheral blood collection and availability of archived DNA, as well as the accessibility of publicly available blood-cell-based methylation data. Further, we address multiple novel areas of EWAS analysis that have not been covered in previous reviews: (1) longitudinal study designs, (2) the chip analysis methylation pipeline (ChAMP), (3) differentially methylated region (DMR) identification paradigms, (4) methylation quantitative trait loci (methQTL) analysis, (5) methylation age analysis and (6) identifying cell-specific differential methylation from mixed cell data using statistical deconvolution.

Keywords: Bioinformatics; ChAMP; Complex diseases; EWAS; Epigenetics; Methylation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Popularity of methylation microarrays. The proportion of EWASes deposited on GEO (NCBI) each year, by array type. Abbreviations: EWAS = Epigenome-wide association study, GEO = Gene Expression Omnibus, NCBI = National Center for Biotechnology Information
Fig. 2
Fig. 2
Popularity of EWAS pipelines. The proportion of PubMed citations by year for methylation data analysis using different Illumina arrays. Abbreviations: EWAS = Epigenome-wide association study
Fig. 3
Fig. 3
A standard EWAS workflow using Minfi or ChAMP packages. Analyses are either common to both packages, specific to one package, or completed with other unspecified packages. Abbreviations: DMP = differentially methylated position, DMR = differentially methylated region; DMB = differentially methylated block; CNV = copy number variation; methQTL = methylation quantitative trait loci; MRS = methylation risk score
Fig. 4
Fig. 4
Steps and tools for primary EWAS analysis steps. Listed tools include ChAMP (https://bioconductor.org/packages/release/bioc/html/ChAMP.html), Minfi (https://bioconductor.org/packages/release/bioc/html/minfi.html) and missMethyl (http://bioconductor.org/packages/release/bioc/html/missMethyl.html). Abbreviations: EWAS = epigenome-wide association study, ChAMP = chip analysis methylation pipeline, HPC = high performance computer, RAM = random access memory, SNP = single nucleotide polymorphism, BMIQ = beta mixture quantile, SVD = singular value decomposition, QC = quality control, DMR = differentially methylated region, ADB = absolute deta beta
Fig. 5
Fig. 5
Popularity of normalisation methods. The proportion of PubMed citations by year for methylation data normalisation algorithms.
Fig. 6
Fig. 6
Overall distance separating adjacent probes on the 450K and EPIC microarrays. Probes were clustered based on either gene feature (TSS = transcription starting site, UTR = untranslated region, IGR = intergenic region,) or methylation pattern. Data was extracted from the probe.features.epic (EPIC) and probe.features (450K) objects provided by the ChAMP R package
Fig. 7
Fig. 7
Popularity of DMR identification tools. The proportion of PubMed publications by year using common DMR identification tools. Abbreviations: DMR = differentially methylated regions
Fig. 8
Fig. 8
DMR identification and prioritisation paradigm. DMRs are defined as consecutive DMPs with the same direction of effect. Available bioinformatic algorithms allow researchers to select the threshold, minimum number and distance between DMPs. We recommend an FDR < 0.05, at least 2–5 consecutive DMPs, and 500–2000 bp between consecutive DMPs. After DMRs have been identified, researchers can prioritise biologically relevant DMRs by ranking them by mean or maximum absolute delta beta (ADB), filtering out DMRs with ADB < 0.02 and identifying major DMRs as those with ADB > 0.05. Major DMRs should be used for downstream functional analyses, while all DMRs with ADB > 0.02 should be used in gene ontology analysis. Abbreviations: DMR = differentially methylated regions, DMP = differentially methylated position, ADB = absolute delta beta
Fig. 9
Fig. 9
methQTL at polymorphism rs9271155 based on sample group. Genotype at rs9271155 affects methylation level at CpG site cg17416722, whereby individuals with AA genotype at rs9271155 have low methylation levels at cg17416722, and individuals with BB genotype have high methylation levels. Individuals with a heterozygous genotype (AB) have intermediate methylation levels. From [unpublished data]. Abbreviations: methQTL = methylation quantitative trait locus
Fig. 10
Fig. 10
Popularity of methylation age indices. The proportion of PubMed publications by year for DNA methylation age indices
Fig. 11
Fig. 11
Steps and tools for downstream EWAS analyses. Recommended tools include epiDISH (https://www.bioconductor.org/packages/release/bioc/html/EpiDISH.html), ENmix (https://bioconductor.org/packages/release/bioc/html/ENmix.html) and GEM (https://bioconductor.org/packages/release/bioc/html/GEM.html). Abbreviations: EWAS = epigenome-wide association study, methQTL = methylation quantitative trait loci, MRS = methylation risk score, RPC = robust partial correlation, CBS = CIBERSORT, CP = constrained projection
Fig. 12
Fig. 12
EWAS Databases containing deposited, integrated and/or associated datasets. Deposition databases: GEO, ArrayExpress. Integrated databases: ENCODE, IHEC, MethBank, DiseaseMeth, EWAS Datahub. Association databases: EWAS Atlas, EWASdb, EWAS Catalog. Site URLs are listed in "Packages and databases" section: Packages and databases. Abbreviations: EWAS = epigenome-wide association study, GEO = Gene Expression Omnibus, IHEC = International Human Epigenome Consortium

References

    1. Zheleznyakova GY, Piket E, Marabita F, et al. Epigenetic research in multiple sclerosis: progress, challenges, and opportunities. Physiol Genomics. 2017;49(9):447–461. - PubMed
    1. Li X, Xiao B, Chen X-S. DNA methylation: a new player in multiple sclerosis. Mol Neurobiol. 2017;54(6):4049–4059. - PubMed
    1. Hedrich CM, Mäbert K, Rauen T, Tsokos GC. DNA methylation in systemic lupus erythematosus. Epigenomics. 2017;9(4):505–525. - PMC - PubMed
    1. Guo S, Xu L, Chang C, et al. Epigenetic regulation mediated by methylation in the pathogenesis and precision medicine of rheumatoid arthritis. Front Genet. 2020;11:811. - PMC - PubMed
    1. Bibikova M, Le J, Barnes B, et al. Genome-wide DNA methylation profiling using Infinium assay. Epigenomics. 2009;1(1):177–200. - PubMed

MeSH terms