Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 3:10:1.
doi: 10.1186/s13072-016-0108-y. eCollection 2017.

Genome-wide methylation data mirror ancestry information

Affiliations

Genome-wide methylation data mirror ancestry information

Elior Rahmani et al. Epigenetics Chromatin. .

Abstract

Background: Genetic data are known to harbor information about human demographics, and genotyping data are commonly used for capturing ancestry information by leveraging genome-wide differences between populations. In contrast, it is not clear to what extent population structure is captured by whole-genome DNA methylation data.

Results: We demonstrate, using three large-cohort 450K methylation array data sets, that ancestry information signal is mirrored in genome-wide DNA methylation data and that it can be further isolated more effectively by leveraging the correlation structure of CpGs with cis-located SNPs. Based on these insights, we propose a method, EPISTRUCTURE, for the inference of ancestry from methylation data, without the need for genotype data.

Conclusions: EPISTRUCTURE can be used to infer ancestry information of individuals based on their methylation data in the absence of corresponding genetic data. Although genetic data are often collected in epigenetic studies of large cohorts, these are typically not made publicly available, making the application of EPISTRUCTURE especially useful for anyone working on public data. Implementation of EPISTRUCTURE is available in GLINT, our recently released toolset for DNA methylation analysis at: http://glint-epigenetics.readthedocs.io.

Keywords: Ancestry; DNA methylation; Epigenetics; Epigenome-wide association study (EWAS); Illumina 450K; Population structure.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Fraction of variance explained in the first two genotype-based PCs of the GALA II data using several methods. Presented are linear predictors using increasing number of EPISTRUCTURE PCs (in blue), methylation-based PCs (in red) and methylation-based PCs after feature selection based on a previous study [21] (in yellow) for capturing a the first genotype-based PC and b the second genotype-based PC
Fig. 2
Fig. 2
Capturing population structure in the GALA II data using an unsupervised approach. a The first two PCs of the genotypes, considered as the gold standard, separate the samples into two subpopulations: Puerto Ricans (in blue) and Mexicans (in red), b the first two PCs of the methylation levels (methylation PCs) cannot reconstruct the separation found with the genotype data, c recalculating the first two PCs after applying a feature selection based on proximity of CpGs to nearby SNPs as was proposed by Barfield et al. [21], d the first two PCs of the methylation after adjusting the data for cell-type composition (adjusted methylation PCs) can reconstruct most of the separation found in the genotypes, e using adjusted methylation PCs after excluding the 70,889 polymorphic sites from the data, f using adjusted methylation PCs after excluding the 167,738 probes containing at least one common SNP
Fig. 3
Fig. 3
Capturing population structure in the CHAMACOS data. Presented are linear predictors of the first genotype-based PC using a the first two methylation PCs of the data, b the first two PCs calculated after applying a feature selection based on proximity of CpGs to nearby SNPs [21], c the first two PCs after adjusting the data for cell-type composition (adjusted methylation PCs), d the first two adjusted methylation PCs after excluding 167,738 probes containing SNPs from the data and e using the first two EPISTRUCTURE PCs

References

    1. Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, et al. Genes mirror geography within Europe. Nature. 2008;456(7218):98–101. doi: 10.1038/nature07331. - DOI - PMC - PubMed
    1. Price AL, Butler J, Patterson N, Capelli C, Pascali VL, Scarnicci F, Ruiz-Linares A, Groop L, Saetta AA, Korkolopoulou P, et al. Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. 2008;4(1):236. doi: 10.1371/journal.pgen.0030236. - DOI - PMC - PubMed
    1. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155(2):945–59. - PMC - PubMed
    1. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38(8):904–9. doi: 10.1038/ng1847. - DOI - PubMed
    1. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64. doi: 10.1101/gr.094052.109. - DOI - PMC - PubMed

Publication types