Methylome-wide association studies and epigenetic biomarker development for 133 mass spectrometry-assessed circulating proteins in 14,671 Generation Scotland participants

Josephine A Robertson¹, Jakub Bajzik², Spyros Vernardis^{3

4}, Aleksandra D Chybowska¹, Daniel L McCartney¹, Arturas Grauslys⁴, Jure Mur^{1

5}, Hannah M Smith¹, Archie Campbell^{1

6}, Camilla Drake⁷, Hannah Grant¹, Jamie Pearce⁸, Tom C Russ^{9

5}, Poppy Adkin^{3

10}, Matthew White³, Charles Brigden⁴, Christoph B Messner^{3

11}, David J Porteous¹, Caroline Hayward^{1

7}, Simon R Cox¹², Aleksej Zelezniak^{3

4

13

14

15}, Markus Ralser^{3

4

16}, Matthew R Robinson², Riccardo E Marioni¹⁷

Affiliations

¹ Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
² Institute of Science and Technology, Vienna, Austria.
³ Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK.
⁴ Eliptica Limited, The London Cancer Hub, Cotswold Road, Sutton, London, UK.
⁵ Alzheimer Scotland Dementia Research Centre, Department of Psychology, University of Edinburgh, Edinburgh, UK.
⁶ Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, UK.
⁷ MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
⁸ Centre for Research on Environment, Society and Health, School of Geosciences, University of Edinburgh, Edinburgh, UK.
⁹ Division of Psychiatry, Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK.
¹⁰ Medical Research Council Clinical Trials Unit, University College London, London, UK.
¹¹ Precision Proteomics Center, Swiss Institute of Allergy and Asthma Research, University of Zurich, Zurich, Switzerland.
¹² Department of Psychology, The Lothian Birth Cohorts, University of Edinburgh, Edinburgh, UK.
¹³ Randall Centre for Cell & Molecular Biophysics, King's College London, New Hunt's House, Guy's Campus, London, SE1 1UL, UK.
¹⁴ Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, Gothenburg, SE-412 96, Sweden.
¹⁵ Institute of Biotechnology, Life Sciences Centre, Vilnius University, Sauletekio al. 7, Vilnius, LT10257, Lithuania.
¹⁶ Department of Biochemistry, Charité Universitätsmedizin Berlin, Berlin, Germany.
¹⁷ Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK. riccardo.marioni@ed.ac.uk.

PMID: 41361833
PMCID: PMC12683789
DOI: 10.1186/s13059-025-03892-0

Methylome-wide association studies and epigenetic biomarker development for 133 mass spectrometry-assessed circulating proteins in 14,671 Generation Scotland participants

Josephine A Robertson et al. Genome Biol. 2025.

. 2025 Dec 8;26(1):417.

doi: 10.1186/s13059-025-03892-0.

Authors

Affiliations

¹ Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
² Institute of Science and Technology, Vienna, Austria.
³ Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, UK.
⁴ Eliptica Limited, The London Cancer Hub, Cotswold Road, Sutton, London, UK.
⁵ Alzheimer Scotland Dementia Research Centre, Department of Psychology, University of Edinburgh, Edinburgh, UK.
⁶ Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, UK.
⁷ MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK.
⁸ Centre for Research on Environment, Society and Health, School of Geosciences, University of Edinburgh, Edinburgh, UK.
⁹ Division of Psychiatry, Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK.
¹⁰ Medical Research Council Clinical Trials Unit, University College London, London, UK.
¹¹ Precision Proteomics Center, Swiss Institute of Allergy and Asthma Research, University of Zurich, Zurich, Switzerland.
¹² Department of Psychology, The Lothian Birth Cohorts, University of Edinburgh, Edinburgh, UK.
¹³ Randall Centre for Cell & Molecular Biophysics, King's College London, New Hunt's House, Guy's Campus, London, SE1 1UL, UK.
¹⁴ Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, Gothenburg, SE-412 96, Sweden.
¹⁵ Institute of Biotechnology, Life Sciences Centre, Vilnius University, Sauletekio al. 7, Vilnius, LT10257, Lithuania.
¹⁶ Department of Biochemistry, Charité Universitätsmedizin Berlin, Berlin, Germany.
¹⁷ Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, UK. riccardo.marioni@ed.ac.uk.

PMID: 41361833
PMCID: PMC12683789
DOI: 10.1186/s13059-025-03892-0

Abstract

Background: DNA methylation (DNAm) can regulate gene expression, and its genome-wide patterns (epigenetic scores or EpiScores) can act as biomarkers for complex traits. The relative stability of methylation profiles may enable better assessment of chronic exposures compared to single time-point protein measures. We present the first large-scale epigenetic study of the highly-abundant serum proteome measured via ultra-high throughput mass spectrometry in 14,671 samples from the Generation Scotland cohort. We further demonstrate the first large-scale comparison of protein EpiScores and their respective proteins as predictors of incident cardiovascular disease.

Results: Marginal epigenome-wide association models, adjusting for age, sex, measurement batch, estimated white cell proportions, BMI, smoking and methylation principal components, reveal 15,855 significant CpG – protein associations across 125 of 133 proteins P_Bonferroni < 2.71 × 10^-10. Bayesian epigenome-wide association studies of the same 133 proteins reveal 697 CpG-Protein associations (posterior inclusion probability > 0.95). 112 protein EpiScores correlate significantly with their respective protein in a holdout test-set. Of these, sixteen associate significantly with incident all-cause cardiovascular disease (N_events=191) compared to one measured protein.

Conclusions: We highlight a complex interplay between the blood-based methylome and proteome. Importantly, we show that protein EpiScores correlate with measured proteins and demonstrate that the, as-yet understudied, high-abundance proteome may yield clinically relevant biomarkers. The protein EpiScores demonstrate more significant associations with cardiovascular disease than directly measured proteins, suggesting their potential as clinical biomarkers for monitoring or predicting disease risk. We suggest that biomarker development could be enhanced by the consideration of protein EpiScores alongside measured proteins.

Supplementary Information: The online version contains supplementary material available at 10.1186/s13059-025-03892-0.

Keywords: Biomarkers; Cardiovascular disease; Epigenetics; Proteomics.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Ethical approval for the GS cohort was received from the NHS Tayside Committee on Medical Research Ethics (REC Reference Number: 05/S1401/89) and Research Tissue Bank status was granted by the East of Scotland Research Ethics Service (REC Reference Number: 20/ES/0021). Participants provided written informed consent. Consent for publication: Not applicable. Competing interests: C.B., A.Z. and M.R. are co-founders of Eliptica Ltd. C.B.M. is a consultant and shareholder of Eliptica Ltd (London, UK). R.E.M. is a scientific advisor to the Epigenetic Clock Development Foundation and Optima Partners. D.L.M. is employed by Optima Partners Ltd. The other authors have no competing interests to declare.

Figures

**Fig. 1**
Overview of EWAS and EpiScore workflow and results. For OSCA linear marginal regression analysis, each CpG is modelled individually for every protein within each model. For GMRMomi Bayesian penalised regression, all CpGs are modelled jointly. The Bayesian approach was subsequently used to identify lead CpGs and for the generation of protein EpiScores. WBC = estimated white blood cell proportions; BMI = log transformation of body mass index (kg/m²); smoking = log transformation of smoking pack-years (+ 1); PCs = Principal Components; PIP = posterior inclusion probability. Created in BioRender, Marioni, R. (2025) https://BioRender.com/q80a293

**Fig. 2**
Summary of 697 protein ~ CpG associations from the Bayesian EWAS results. A The distribution of number of proteins by number of CpG associations; B The distribution of number CpGs by number of protein associations; C The correlation between the number of CpG association of each protein, by the mean proportion of variance explained by all CpG loci; D The proportion of CpGs in regions, specified by relation to CpG islands for the EPIC array and for the Bayesian EWAS results, demonstrating enriched results in Open Sea and reduced in Island regions; E Mean effect size of associations by association type, demonstrating the effect size is similar whether the association is in cis or trans. Unassigned associations are those for which the protein gene could not be annotated to a position in GRCh37 (N = 24, Additional file 3: Methods M3); F Each association plotted by genomic position of the protein gene and CpG probe demonstrating the distribution of associations across the genome

**Fig. 3**
Pearson correlation of 112 EpiScores and proteins in the Generation Scotland test set. Test set N = 3,463. Correlation results displayed for 112 EpiScores where Pearson r>0.1 and P < 0.05 using the EPICv1 loci. Central dot represents Pearson r and the error bars represent 95% confidence intervals. Proteins are labelled by gene, except for Ig-like domain-containing protein 1 (A0A0G2JRQ6) and 2 (A0A0J9YY99), annotated by UniProtID. These proteins were annotated to scaffolds or patches in build hg19 and have not been assigned gene names (see Additional file 3: Methods M3.). Transferrin (C9JB55, 75 amino acids) is also labelled by UniProtID as it originates from the same gene as Serotransferrin (P02787, 698 amino acids, labelled TF)

**Fig. 4**
EpiScore and measured protein hazard ratios for time-to incident cardiovascular disease. Results are displayed where either protein or EpiScore demonstrate Bonferroni-significant associations (P < 0.05/112) in model 1. Model 1: TTE ~ EpiScore/Protein + age + sex; Model 2: TTE ~ EpiScore/Protein + age + sex + BMI + smoking + alcohol; Model 3: TTE ~ EpiScore/Protein + age + sex + BMI + smoking + alcohol + diabetes + hypertension + HDL cholesterol + Total cholesterol + average systolic blood pressure + average diastolic blood pressure. EpiScore/Protein denotes EpiScore or protein as a predictor variable. HR = Hazard Ratio per SD of the predictor, CI = 95% confidence interval. Colour in bold denotes significance at P_Bonferroni < 4.46 × 10^− 4 (= 0.05/112). Proteins are labelled by gene, with the exception of Ig-like domain-containing protein 1 (A0A0G2JRQ6) and 2 (A0A0J9YY99), annotated by UniProtID, which were annotated to scaffolds or patches in build hg19 and have not been assigned gene names (see Additional file 3: Methods M3.). Transferrin (C9JB55, 75 amino acids) is also labelled by UniProtID as it originates from the same gene as Serotransferrin (P02787, 698 amino acids, labelled TF)

See this image and copyright information in PMC

References

1. Gadd DA, Hillary RF, Kuncheva Z, Mangelis T, Cheng Y, Dissanayake M, et al. Blood protein assessment of leading incident diseases and mortality in the UK Biobank. Nat Aging. 2024;4:939–48. 10.1038/s43587-024-00655-7. - DOI - PMC - PubMed
1. Carrasco-Zanini J, Pietzner M, Davitte J, Surendran P, Croteau-Chonka DC, Robins C, et al. Proteomic signatures improve risk prediction for common and rare diseases. Nat Med. 2024;30:2489–98. 10.1038/s41591-024-03142-z. - DOI - PMC - PubMed
1. Suhre K, Zaghlool S. Connecting the epigenome, metabolome and proteome for a deeper understanding of disease. J Intern Med. 2021;290:527–48. 10.1111/joim.13306. - DOI - PubMed
1. Moore LD, Le T, Fan G. DNA methylation and its basic function. Neuropsychopharmacology. 2013;38:23–38. 10.1038/npp.2012.112. - DOI - PMC - PubMed
1. Yousefi PD, Suderman M, Langdon R, Whitehurst O, Davey Smith G, Relton CL. DNA methylation-based predictors of health: applications and statistical considerations. Nat Rev Genet. 2022;23:369–83. 10.1038/s41576-022-00465-w. - DOI - PubMed

Grants and funding

U.MC_UU_00007/10/Medical Research Scotland

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Methylome-wide association studies and epigenetic biomarker development for 133 mass spectrometry-assessed circulating proteins in 14,671 Generation Scotland participants

Affiliations

Methylome-wide association studies and epigenetic biomarker development for 133 mass spectrometry-assessed circulating proteins in 14,671 Generation Scotland participants

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources