Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep;30(9):2450-2460.
doi: 10.1038/s41591-024-03164-7. Epub 2024 Aug 8.

Proteomic aging clock predicts mortality and risk of common age-related diseases in diverse populations

Affiliations

Proteomic aging clock predicts mortality and risk of common age-related diseases in diverse populations

M Austin Argentieri et al. Nat Med. 2024 Sep.

Abstract

Circulating plasma proteins play key roles in human health and can potentially be used to measure biological age, allowing risk prediction for age-related diseases, multimorbidity and mortality. Here we developed a proteomic age clock in the UK Biobank (n = 45,441) using a proteomic platform comprising 2,897 plasma proteins and explored its utility to predict major disease morbidity and mortality in diverse populations. We identified 204 proteins that accurately predict chronological age (Pearson r = 0.94) and found that proteomic aging was associated with the incidence of 18 major chronic diseases (including diseases of the heart, liver, kidney and lung, diabetes, neurodegeneration and cancer), as well as with multimorbidity and all-cause mortality risk. Proteomic aging was also associated with age-related measures of biological, physical and cognitive function, including telomere length, frailty index and reaction time. Proteins contributing most substantially to the proteomic age clock are involved in numerous biological functions, including extracellular matrix interactions, immune response and inflammation, hormone regulation and reproduction, neuronal structure and function and development and differentiation. In a validation study involving biobanks in China (n = 3,977) and Finland (n = 1,990), the proteomic age clock showed similar age prediction accuracy (Pearson r = 0.92 and r = 0.94, respectively) compared to its performance in the UK Biobank. Our results demonstrate that proteomic aging involves proteins spanning multiple functional categories and can be used to predict age-related functional status, multimorbidity and mortality risk across geographically and genetically diverse populations.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the study design and analytic approaches.
a, UKB participants were split into training and test sets at a 70:30 ratio. In the training set, a LightGBM model was trained to predict chronological age using 2,897 plasma proteins and fivefold cross-validation. We identified 204 proteins relevant for predicting chronological age using the Boruta feature selection algorithm and retrained a refined LightGBM model using these 204 proteins, which was then evaluated in the UKB test set. b, Independent data from the CKB and FinnGen were used for further independent validation of the proteomic age clock model. c, Protein-predicted age (ProtAge) was calculated in the full UKB sample using fivefold cross-validation and LightGBM. ProtAgeGap was calculated as the difference between ProtAge and chronological age. We used linear and logistic regression to test associations between ProtAgeGap and a comprehensive panel of biological aging markers and measures of frailty and physical/cognitive status. Further, we used Cox proportional hazards models to test associations between ProtAgeGap and mortality, 14 common diseases and 12 cancers. Most association analyses were carried out only in the UKB, due to the smaller sample size in the CKB and the lack of disease cases in FinnGen. Figure created with BioRender.com.
Fig. 2
Fig. 2. Proteomic aging clock performance across cohorts.
a, Density plot of age at recruitment in the UKB, CKB and FinnGen. b, Density plot of age at death in the UKB (4,784 deaths; 10.6%) and CKB (1,426 deaths; 36%). c, Counts of prevalent and incident cases of all common diseases studied in the UKB sample (n = 45,441). d, Performance of the trained proteomic aging model in the UKB holdout test set (n = 13,633). e, Performance of the trained proteomic aging model in the CKB (n = 3,977). f, Performance of the trained proteomic aging model in FinnGen (n = 1,990). g, Sex distributions of ProtAgeGap in the UKB (female n = 24,579; male n = 20,862), CKB (female n = 2,137; male n = 1,840) and FinnGen (female n = 1,032; male n = 958). h, Distributions of ProtAgeGap according to self-reported ethnicity in the UKB (white n = 42,320; Black n = 1,114; Asian n = 1,016; other n = 554; mixed n = 293). i, Distributions of ProtAgeGap according to geographic region of residence in the CKB (Gansu n = 397; Henan n = 493; Hunan n = 462; Sichuan n = 341; Zhejiang n = 342; Haikou n = 298; Harbin n = 598; Liuzhou n = 379; Qingdao n = 415; Suzhou n = 252). Correlation coefficients shown in df are Pearson correlation coefficients. Violin plots in gi, with center line, box limits and whiskers representing the median, interquartile range and minima/maxima within each group, respectively. RMSE, root mean squared error; MAE, mean absolute error.
Fig. 3
Fig. 3. ProtAgeGap is associated with age-related biological, physical and cognitive function.
a, Associations between ProtAgeGap and biological aging mechanisms in the full UKB sample (n = 45,441). b, Associations between ProtAgeGap and measures of physiological and cognitive (reaction time and fluid intelligence) function in the full UKB sample (n = 45,441). c, Associations between ProtAgeGap and biological aging mechanisms in the subsample of UKB participants with no lifetime diagnosis of any of the 26 diseases studied (n = 20,315). d, Associations between ProtAgeGap and measures of physiological and cognitive function in the subsample of UKB participants with no lifetime diagnosis of any of the 26 diseases studied (n = 20,315). All models used linear or logistic regression and were adjusted for age, sex, Townsend deprivation index, recruitment center, ethnicity, IPAQ activity group and smoking status. In all plots, beta estimates (and 95% confidence intervals) for the association between ProtAgeGap and each outcome are shown on the x axis. Beta estimates in red are from the full 204-protein model (ProtAgeGap), whereas beta estimates in blue are from the smaller proteomic age clock model with 20 proteins (ProtAgeGap20). FEV1, forced expiratory volume in 1 s; IPAQ, International Physical Activity Questionnaire; FDR, false discovery rate.
Fig. 4
Fig. 4. ProtAgeGap stratifies individuals into divergent age-specific mortality and disease risk trajectories in the UKB and CKB.
a,b, Cumulative incidence plots for the indicated diseases and mortality for the top, median and bottom deciles of ProtAgeGap in the UKB (n = 45,441) (a) and CKB (n = 3,977) (b). The number of incident cases is shown for each disease, indicating the total number of incident cases only among individuals in the three deciles shown, not the full dataset. Incidence rates are shown for the subsequent 11–16 years (UKB) or 11–14 years (CKB) of follow-up after recruitment for each given age at recruitment (for example, in a the cumulative incidence rate shown at age 65 years is the rate of incident cases in the 11–16 years of follow-up for those aged 65 years at recruitment). All plots show the cumulative density of events at a given timepoint based on the Kaplan–Meier survival function, with 95% confidence intervals shown in lighter shading. Diseases shown here for the CKB are those with more than 10 cases across the three deciles of ProtAgeGap.
Fig. 5
Fig. 5. Effect sizes of the associations of ProtAgeGap with mortality and common diseases are largely invariant to covariate adjustment.
Associations between ProtAgeGap and mortality and disease incidence using Cox proportional hazards models are shown for models with increasing levels of covariate adjustment. Shown on the x axis are HRs (and 95% CIs) for the effect of ProtAgeGap on the outcomes shown. Events listed are the total number of incident cases for each outcome. All models were run using the full UKB sample (n = 45,441). a, Model 1 was adjusted for age and sex. b, Model 2 was adjusted for age, sex, ethnicity, Townsend deprivation index, recruitment center, IPAQ activity group and smoking status. c, Model 3 was adjusted for age, sex, ethnicity, Townsend deprivation index, recruitment center, IPAQ activity group, smoking status, BMI and prevalent hypertension. HR estimates in red are from the full 204-protein model (ProtAgeGap), whereas estimates in blue are from the smaller proteomic age clock model with 20 proteins (ProtAgeGap20).
Extended Data Fig. 1
Extended Data Fig. 1. Stability of ProtAge protein associations with age across three time points.
Comparison of betas for the association between age and each of the 149 ProtAge APs with repeat measurements available during baseline and two follow-up imaging visits (n = 1,085). a) Comparison of betas for the association between age and each of the 149 ProtAge APs during baseline and the 2014+ follow-up imaging visit. b) Comparison of betas for the association between each of these 149 ProtAge APs and age during baseline and the 2019+ imaging visit. c) Comparison of betas for the association between each of the 149 ProtAge APs and age during the 2014+ imaging visit and during the 2019+ imaging visit. Shown in each plot are the Pearson correlation coefficient (r), p-value for the correlation, and the model slope (λ). APs: aging-related proteins.
Extended Data Fig. 2
Extended Data Fig. 2. ProtAgeGap and age-specific cancer risk trajectories in the UKB.
Cumulative incidence plots for the top, median, and bottom deciles of ProtAgeGap in the UK Biobank (UKB; n = 45,441). Number of incident cases are shown for each cancer – these numbers reflect the total number of incident cases present only among those in the 3 deciles shown, not the full dataset. Incidence rates are for the 11-16 years after recruitment. Incidence rates are shown for the subsequent 11-16 years of follow up after recruitment for each given age at recruitment (for example, the cumulative incidence rate shown at age 65 is the rate of incident cases in the 11-16 years of follow up for those aged 65 at recruitment). All plots show the cumulative density of events at a given timepoint based on the Kaplan-Meier survival function, with 95% confidence intervals in lighter shading. Certain plots show multiple 0s on the y-axis because these represent decimal values < 0.5 and the y-axis values are rounded to a single digit. ProtAgeGap: proteomic age gap (in years).
Extended Data Fig. 3
Extended Data Fig. 3. Associations between ProtAgeGap and cancers in the UKB.
Associations between ProtAgeGap and and incident cancer diagnoses in Cox proportional hazards models with increasing levels of covariate adjustment. Shown on the x-axis are hazard ratios (and 95% confidence intervals) for the effect of ProtAgeGap on the outcomes shown. Events listed are the total number of incident cases for each outcome. Within each model, p-values across tests for all outcomes were corrected for multiple comparisons using the false discovery rate (FDR). All models were run in the UK Biobank (UKB; n = 45,441). a). Model 1 is adjusted for age and sex. b) Model 2 is adjusted for age, sex, Townsend deprivation index, recruitment centre, IPAQ activity group, and smoking status. c) Model 3 is adjusted for age, sex, Townsend deprivation index, recruitment centre, IPAQ activity group, smoking status, BMI, and prevalent hypertension. ProtAgeGap: proteomic age gap (in years).
Extended Data Fig. 4
Extended Data Fig. 4. Effect size of ProtAgeGap on mortality and disease among non-smokers and those within normal weight range.
Associations between ProtAgeGap and mortality or diseases among UK Biobank participants who report being never smokers (n = 24,528) (a) and with a BMI ≥ 18.5 and < 25 kg/m2 (n = 14,555) (b). Shown on the x-axis are hazard ratios (and 95% confidence intervals) for the effect of ProtAgeGap on the outcomes shown. Events listed are the total number of incident cases for each outcome. All models are Cox proportional hazards models using model 2 (adjusted for age, sex, Townsend deprivation index, recruitment centre, and IPAQ activity group). No adjustment was made for multiple comparisons. ProtAgeGap: proteomic age gap (in years).
Extended Data Fig. 5
Extended Data Fig. 5. Associations between individual ProtAge APs and each disease studied.
For each outcome associated with ProtAgeGap20, a Cox proportional hazards model (n = 45,441) was calculated with all 20 proteins from the ProtAgeGap20 score, adjusted for age, sex (except prostate cancer), ethnicity, Townsend deprivation index, recruitment center, IPAQ activity group, and smoking status. No adjustment was made for multiple comparisons. In a), the association between each protein and incident disease is colored by z-score, with z-scores for associations with p-value ≥ 0.05 set to 0. In b), the importance of each protein with p < 0.05 is shown as a relative contribution. Relative contribution for each disease is calculated by scaling z-score for significant proteins such that they add to 1. APs: aging-related proteins; ProtAgeGap20: proteomic age gap from the 20-protein model.
Extended Data Fig. 6
Extended Data Fig. 6. ProtAgeGap increases linearly with increasing disease multimorbidity.
a) Years of ProtAgeGap in those with 0 (n = 6,826), 1 (n = 2,056), 2 (n = 605), 3 (n = 206), and 4+ (n = 116) comorbid conditions among UK Biobank (UKB) participants 40-50 years old at recruitment (total n = 9,809). b) Years of ProtAgeGap in those with 0 (n = 10,665), 1 (n = 6,903), 2 (n = 3,765), 3 (n = 1,702), and 4+ (n = 1,410) comorbid conditions among UKB participants aged 51-65 years old at recruitment (total n = 24,445). c) Percentages of the UKB population with 0, 1, 2, 3, and 4+ lifetime disease diagnoses. d) Years of ProtAgeGap according to levels of self-rated health in the UKB (total n = 43,393; Poor n = 2,249; Fair n = 9,355; Good n = 24,752; Excellent n = 7,004). Multimorbidity is defined as the number of lifetime diagnoses of any of the 26 diseases analyzed in this study. In a, b, and d, violin plots with center line, box limits, and whiskers represent the median, interquartile range, and minima/maxima within each group. For violin plots only, outliers were trimmed that were more than 2 standard deviations from total mean across all groups in the population subgroup plotted. Tests for significant differences between the means of groups were performed using a two-sided t-test. n.s.: not statistically significant; *** p-value < 0.001; ProtAgeGap: proteomic age gap (in years).
Extended Data Fig. 7
Extended Data Fig. 7. Protein–protein interaction network of ProtAge APs from the STRING database.
Protein–protein interaction (PPI) network of a highly interconnected subset of APs in the ProtAge model with at least 2 node connections using experimental PPI information from the STRING database. Proteins are sized and colored by number of connections, with those showing a greater number of connections with other proteins displayed larger and more yellow.
Extended Data Fig. 8
Extended Data Fig. 8. Protein–protein interaction network of ProtAge APs using SHAP interaction values.
Protein–protein interaction (PPI) network using SHAP values from the trained ProtAge model. Proteins shown are only those that are highly interconnected using a cutoff of 0.0083 for mean absolute SHAP interaction values. Proteins are sized and colored by number of connections, with those showing a greater number of connections with other proteins displayed larger and more yellow.
Extended Data Fig. 9
Extended Data Fig. 9. Overlap of ProtAge APs with existing DNAm and proteomic clock publications.
a) Overlap between genes coding for the 204 ProtAge APs versus genes mapped by proximity to CpGs from common DNAm clocks. b) Overlap between 204 ProtAge APs versus a recent systematic review of APs (Johnson et al. 2020), a recent comprehensive analysis of SOMAscan proteins associated with age (Coenen et al. 2023), and a recent proteomic aging clock created using SOMAscan data (Lehallier et al. 2019). APs: aging-related proteins, DNAm: DNA methylation.

References

    1. Niccoli, T. & Partridge, L. Ageing as a risk factor for disease. Curr. Biol.22, R741–R752 (2012). 10.1016/j.cub.2012.07.024 - DOI - PubMed
    1. Partridge, L., Deelen, J. & Slagboom, P. E. Facing up to the global challenges of ageing. Nature561, 45–56 (2018). 10.1038/s41586-018-0457-8 - DOI - PubMed
    1. Chang, A. Y., Skirbekk, V. F., Tyrovolas, S., Kassebaum, N. J. & Dieleman, J. L. Measuring population ageing: an analysis of the Global Burden of Disease Study 2017. Lancet Public Health4, e159–e167 (2019). 10.1016/S2468-2667(19)30019-2 - DOI - PMC - PubMed
    1. Laconi, E., Marongiu, F. & DeGregori, J. Cancer as a disease of old age: changing mutational and microenvironmental landscapes. Br. J. Cancer122, 943–952 (2020). 10.1038/s41416-019-0721-1 - DOI - PMC - PubMed
    1. Rutledge, J., Oh, H. & Wyss-Coray, T. Measuring biological age using omics data. Nat. Rev. Genet.23, 715–727 (2022). 10.1038/s41576-022-00511-7 - DOI - PMC - PubMed

LinkOut - more resources