Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Dec 8;26(1):417.
doi: 10.1186/s13059-025-03892-0.

Methylome-wide association studies and epigenetic biomarker development for 133 mass spectrometry-assessed circulating proteins in 14,671 Generation Scotland participants

Affiliations

Methylome-wide association studies and epigenetic biomarker development for 133 mass spectrometry-assessed circulating proteins in 14,671 Generation Scotland participants

Josephine A Robertson et al. Genome Biol. .

Abstract

Background: DNA methylation (DNAm) can regulate gene expression, and its genome-wide patterns (epigenetic scores or EpiScores) can act as biomarkers for complex traits. The relative stability of methylation profiles may enable better assessment of chronic exposures compared to single time-point protein measures. We present the first large-scale epigenetic study of the highly-abundant serum proteome measured via ultra-high throughput mass spectrometry in 14,671 samples from the Generation Scotland cohort. We further demonstrate the first large-scale comparison of protein EpiScores and their respective proteins as predictors of incident cardiovascular disease.

Results: Marginal epigenome-wide association models, adjusting for age, sex, measurement batch, estimated white cell proportions, BMI, smoking and methylation principal components, reveal 15,855 significant CpG – protein associations across 125 of 133 proteins PBonferroni < 2.71 × 10-10. Bayesian epigenome-wide association studies of the same 133 proteins reveal 697 CpG-Protein associations (posterior inclusion probability > 0.95). 112 protein EpiScores correlate significantly with their respective protein in a holdout test-set. Of these, sixteen associate significantly with incident all-cause cardiovascular disease (Nevents=191) compared to one measured protein.

Conclusions: We highlight a complex interplay between the blood-based methylome and proteome. Importantly, we show that protein EpiScores correlate with measured proteins and demonstrate that the, as-yet understudied, high-abundance proteome may yield clinically relevant biomarkers. The protein EpiScores demonstrate more significant associations with cardiovascular disease than directly measured proteins, suggesting their potential as clinical biomarkers for monitoring or predicting disease risk. We suggest that biomarker development could be enhanced by the consideration of protein EpiScores alongside measured proteins.

Supplementary Information: The online version contains supplementary material available at 10.1186/s13059-025-03892-0.

Keywords: Biomarkers; Cardiovascular disease; Epigenetics; Proteomics.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Ethical approval for the GS cohort was received from the NHS Tayside Committee on Medical Research Ethics (REC Reference Number: 05/S1401/89) and Research Tissue Bank status was granted by the East of Scotland Research Ethics Service (REC Reference Number: 20/ES/0021). Participants provided written informed consent. Consent for publication: Not applicable. Competing interests: C.B., A.Z. and M.R. are co-founders of Eliptica Ltd. C.B.M. is a consultant and shareholder of Eliptica Ltd (London, UK). R.E.M. is a scientific advisor to the Epigenetic Clock Development Foundation and Optima Partners. D.L.M. is employed by Optima Partners Ltd. The other authors have no competing interests to declare.

Figures

Fig. 1
Fig. 1
Overview of EWAS and EpiScore workflow and results. For OSCA linear marginal regression analysis, each CpG is modelled individually for every protein within each model. For GMRMomi Bayesian penalised regression, all CpGs are modelled jointly. The Bayesian approach was subsequently used to identify lead CpGs and for the generation of protein EpiScores. WBC = estimated white blood cell proportions; BMI = log transformation of body mass index (kg/m2); smoking = log transformation of smoking pack-years (+ 1); PCs = Principal Components; PIP = posterior inclusion probability. Created in BioRender, Marioni, R. (2025) https://BioRender.com/q80a293
Fig. 2
Fig. 2
Summary of 697 protein ~ CpG associations from the Bayesian EWAS results. A The distribution of number of proteins by number of CpG associations; B The distribution of number CpGs by number of protein associations; C The correlation between the number of CpG association of each protein, by the mean proportion of variance explained by all CpG loci; D The proportion of CpGs in regions, specified by relation to CpG islands for the EPIC array and for the Bayesian EWAS results, demonstrating enriched results in Open Sea and reduced in Island regions; E Mean effect size of associations by association type, demonstrating the effect size is similar whether the association is in cis or trans. Unassigned associations are those for which the protein gene could not be annotated to a position in GRCh37 (N = 24, Additional file 3: Methods M3); F Each association plotted by genomic position of the protein gene and CpG probe demonstrating the distribution of associations across the genome
Fig. 3
Fig. 3
Pearson correlation of 112 EpiScores and proteins in the Generation Scotland test set. Test set N = 3,463. Correlation results displayed for 112 EpiScores where Pearson r>0.1 and P < 0.05 using the EPICv1 loci. Central dot represents Pearson r and the error bars represent 95% confidence intervals. Proteins are labelled by gene, except for Ig-like domain-containing protein 1 (A0A0G2JRQ6) and 2 (A0A0J9YY99), annotated by UniProtID. These proteins were annotated to scaffolds or patches in build hg19 and have not been assigned gene names (see Additional file 3: Methods M3.). Transferrin (C9JB55, 75 amino acids) is also labelled by UniProtID as it originates from the same gene as Serotransferrin (P02787, 698 amino acids, labelled TF)
Fig. 4
Fig. 4
EpiScore and measured protein hazard ratios for time-to incident cardiovascular disease. Results are displayed where either protein or EpiScore demonstrate Bonferroni-significant associations (P < 0.05/112) in model 1. Model 1: TTE ~ EpiScore/Protein + age + sex; Model 2: TTE ~ EpiScore/Protein + age + sex + BMI + smoking + alcohol; Model 3: TTE ~ EpiScore/Protein + age + sex + BMI + smoking + alcohol + diabetes + hypertension + HDL cholesterol + Total cholesterol + average systolic blood pressure + average diastolic blood pressure. EpiScore/Protein denotes EpiScore or protein as a predictor variable. HR = Hazard Ratio per SD of the predictor, CI = 95% confidence interval. Colour in bold denotes significance at PBonferroni < 4.46 × 10− 4 (= 0.05/112). Proteins are labelled by gene, with the exception of Ig-like domain-containing protein 1 (A0A0G2JRQ6) and 2 (A0A0J9YY99), annotated by UniProtID, which were annotated to scaffolds or patches in build hg19 and have not been assigned gene names (see Additional file 3: Methods M3.). Transferrin (C9JB55, 75 amino acids) is also labelled by UniProtID as it originates from the same gene as Serotransferrin (P02787, 698 amino acids, labelled TF)

References

    1. Gadd DA, Hillary RF, Kuncheva Z, Mangelis T, Cheng Y, Dissanayake M, et al. Blood protein assessment of leading incident diseases and mortality in the UK Biobank. Nat Aging. 2024;4:939–48. 10.1038/s43587-024-00655-7. - DOI - PMC - PubMed
    1. Carrasco-Zanini J, Pietzner M, Davitte J, Surendran P, Croteau-Chonka DC, Robins C, et al. Proteomic signatures improve risk prediction for common and rare diseases. Nat Med. 2024;30:2489–98. 10.1038/s41591-024-03142-z. - DOI - PMC - PubMed
    1. Suhre K, Zaghlool S. Connecting the epigenome, metabolome and proteome for a deeper understanding of disease. J Intern Med. 2021;290:527–48. 10.1111/joim.13306. - DOI - PubMed
    1. Moore LD, Le T, Fan G. DNA methylation and its basic function. Neuropsychopharmacology. 2013;38:23–38. 10.1038/npp.2012.112. - DOI - PMC - PubMed
    1. Yousefi PD, Suderman M, Langdon R, Whitehurst O, Davey Smith G, Relton CL. DNA methylation-based predictors of health: applications and statistical considerations. Nat Rev Genet. 2022;23:369–83. 10.1038/s41576-022-00465-w. - DOI - PubMed

LinkOut - more resources