Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jun 7;15(1):4884.
doi: 10.1038/s41467-024-49296-9.

MSGene: a multistate model using genetic risk and the electronic health record applied to lifetime risk of coronary artery disease

Affiliations

MSGene: a multistate model using genetic risk and the electronic health record applied to lifetime risk of coronary artery disease

Sarah M Urbut et al. Nat Commun. .

Abstract

Coronary artery disease (CAD) is the leading cause of death among adults worldwide. Accurate risk stratification can support optimal lifetime prevention. Current methods lack the ability to incorporate new information throughout the life course or to combine innate genetic risk factors with acquired lifetime risk. We designed a general multistate model (MSGene) to estimate age-specific transitions across 10 cardiometabolic states, dependent on clinical covariates and a CAD polygenic risk score. This model is designed to handle longitudinal data over the lifetime to address this unmet need and support clinical decision-making. We analyze longitudinal data from 480,638 UK Biobank participants and compared predicted lifetime risk with the 30-year Framingham risk score. MSGene improves discrimination (C-index 0.71 vs 0.66), age of high-risk detection (C-index 0.73 vs 0.52), and overall prediction (RMSE 1.1% vs 10.9%), in held-out data. We also use MSGene to refine estimates of lifetime absolute risk reduction from statin initiation. Our findings underscore our multistate model's potential public health value for accurate lifetime CAD risk estimation using clinical factors and increasingly available genetics toward earlier more effective prevention.

PubMed Disclaimer

Conflict of interest statement

During the course of the project, M.W.Y. became an employee and stock owner of GSK. A.C.F. is co-founder of Goodpath. PTE reports personal fees from Bayer AG, Novartis, and MyoKardia. GP holds equity in Phaeno Biotechnologies, is on the SAB of RealmIDX and currently consults for Delphi Diagnostics. P.N. reports research grants from Allelica, Apple, Amgen,Boston Scientific, Genentech / Roche, and Novartis, personal fees from Allelica, Apple, AstraZeneca, Blackstone Life Sciences, Foresite Labs, Genentech/Roche, GV, HeartFlow, Magnet Biomedicine, and Novartis, scientific advisory board membership of Esperion Therapeutics, Preciseli, and TenSixteen Bio, scientific co-founder of TenSixteen Bio, equity in MyOme, Preciseli, and TenSixteen Bio, and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Multistate transitions over time.
a We depict the potential one-step transitions in our multistate framework. Per year, an individual can progress from health to single risk factor states, CAD or death. Similarly, an individual can progress from single risk factor states, to double risk factor states, to CAD or death; from double risk factor states, to triple risk factor, CAD or death. b We display the proportional occupancy excluding censored individuals at each state. CAD coronary artery disease, Ht hypertension, HyperLip hyperlipidemia, Dm Type 2 diabetes mellitus.
Fig. 2
Fig. 2. Study overview.
a Using the UK Biobank data on half a million participants (54% female) with access to health record from 1940, we harmonize hospitalization, prescription and primary care records from the EHR and train our model on individuals free of CAD at age 40. The UKB required participants to be between ages 40 and 69 between 2006 and 2010 for genotyping. In our model, individuals join disease-free in the “health” state and progress to additional states upon censoring. We use 80% of the eligible data for training and the remaining 20% for testing. For the testing subset, we require that individuals have variables necessary for computation of FRS30 (and FRS30RC) and the pooled cohort equations, which require laboratory (HDL, TC) and biometric (SBP) measurements. b For a sample patient, we document the construction of our cohort. This individual is first observed in the health record at age 25; he is diagnosed with hypertension at age 39, and begins informing our risk estimation for CAD at age 40 in the hypertensive category. He transitions to the hypertension and hyperlipidemia category at age 50, 25 years after first encounter and 10 years after entering our risk estimation, thus contributing 10 years of data. TC total cholesterol, SBP systolic blood pressure, HDL high-density lipoprotein, CAD coronary artery disease, FRS30 Framingham 30-year, FRS30RC Framingham 30-year recalibrated, PCE Pooled cohort Eq. 10-year risk, EHR electronic health record.
Fig. 3
Fig. 3. Survival, 10-year, and lifetime risk curves.
a We demonstrate the singular projected disease-free curve by MSGene for a non-smoking individual not on anti-hypertensives at age 40 of low, medium or high genomic risk from the healthy to CAD transition. b We demonstrate the MSGene predicted 10-year risk for individuals at each age along the x axis, showing that, in general, for fixed-window approaches, 10-year risk is monotonically increasing. c We demonstrate the MSGene predicted lifetime risk curve for individuals at each age featured along the x axis under an untreated (dashed) or treated (solid) strategy. The conditional remaining lifetime risk declines with age, from 24% for a high genomic risk individual in our cohort to <5% for an individual at the same risk level by age 70. d Using the FRS30RC equation, like 10-year risk and unlike the remaining lifetime risk approach, 30-year risk calculation is monotonically increasing, from 13.4 (13.2–13.6%) at age 40 to 32.9% at age 70 for an individual of the highest genomic risk. Of note this is for individuals from healthy to CAD, while additional projections are provided in Supplementary Data 4–16 for individuals from different states. FRS30RC Framingham 30-year recalibrated.
Fig. 4
Fig. 4. Time-dependent threshold analysis.
We consider the distribution of the first age at which an individual exceeds the PCE-derived 10-year threshold of 5% (a), or lifetime threshold or 10% using FRS30RC (b) or the MSGene lifetime prediction (c). We then use this age as a time-dependent predictor of time-to-event in a time-dependent Cox model (“Methods”) in which an individual’s time followed is stratified by start time and periods in which a threshold is passed, and final censoring time with an indicator variable demarcating whether or not each threshold has been surpassed. We left censor these intervals at age of enrollment conservatively to exclude time protected from death. We report Harrell’s C-index for discrimination on how well a model predicts events that tend to occur earlier versus later. Left-facing arrows indicate individuals who surpass the threshold at first prediction, and right-facing arrow indicates individuals who never surpass a threshold for a given metric. FRS30RC is shown here with C-index 0.52 (original FRS30 C-index 0.50) vs. MSGene 0.72, P < 2.13 ×10−140), and PCE10y (C-index 0.55) vs MSGene, P < 2.03 ×10−103). For these analyses, Bootstrap resampling (n = 1000) was used to estimate the C-index and its 95% CI. Data are presented as mean values +/− 1.96 SEM. A two-sided Z-test was used to compare the C-indexes between MSGene and FRS30 and produce a P value (d). We compute the lifetime prediction at each age under one of eight potential risk starting states, with confidence intervals derived from the results of predictions made for 1000 bootstrapped sampling of data, for a sample individual. Data are presented as mean values +/− 1.96 SEM. e Using the electronic health record, we extract state position for each individual per year. We then use MSGene to compute predicted risk for each individual at each state in time, displayed here for a sample individuals. Standard errors are constructed as the result of 1000 bootstrap iterations. Data are presented as mean values +/− 1.96 SEM. f We use these as predictors in a time-dependent Cox model in which we expand the data set into nonoverlapping intervals for each individual (“Methods”) and conservatively left censor before enrollment to avoid time protected from death. We evaluate the concordance when compared to FRS30RC (P < 2.19 × 10–17) and PCE-derived 10-year, P < 3.23 × 10–08 using a P value based on the two-sided Z-test used to compare the C-indexes between MSGene and FRS30. Standard errors constructed as the result of 1000 bootstrap iterations. Data are presented as mean values +/− 1.96 SEM. A two-sided Z-test was used to compare the C-indexes between MSGene and FRS30 and MSGene and PCE-derived 10-year risk (g). FRS30RC Framingham 30-year recalibrated, PCE pooled cohort equations, Cox time-dependent Cox model.
Fig. 5
Fig. 5. Absolute risk reduction: Short-term and lifetime risk.
We display the relationship between remaining lifetime and 10-year risk. Each ray represents an age group, in which individuals are parameterized by their short- (10-year) and long-term (lifetime) risk, and colored by genomic risk in SD from mean. We display the lifetime absolute risk reduction as computed in Equation RR and stratified by age rays, and colored by genetic risk. a For an individual at the top genetic risk at age 40, MSGene predicted 10-year risk is roughly equivalent to an individual at the lowest genetic risk at age 70 (3.8% vs 4.2%, SE 0.01). However, the MSGene projected lifetime benefit is directly proportional to lifetime risk (b), and more than twice that of a high-risk individual at age 70 (5.0 vs 2.3%, SEM 0.02). c Marginalized across starting states and covariate profiles, we project absolute risk difference (%) under a treated and untreated setting. At age 40, this ranges from a median of 5.8% (SD 0.01) to 0.8% (SD 0.01) at age 79. SEM standard error of mean, RR relative risk, SD CAD-PRS SD.

Update of

References

    1. Tsao, C. W. et al. Heart disease and stroke statistics—2023 update: a report from the American Heart Association. Circulation10.1161/CIR.0000000000001123 (2023). - PubMed
    1. Lloyd-Jones DM, et al. Prediction of lifetime risk for cardiovascular disease by risk factor burden at 50 years of age. Circulation. 2006;113:791–798. doi: 10.1161/CIRCULATIONAHA.105.548206. - DOI - PubMed
    1. Wilkins JT, et al. Data resource profile: the cardiovascular disease lifetime risk pooling project. Int. J. Epidemiol. 2015;44:1557–1564. doi: 10.1093/ije/dyv150. - DOI - PMC - PubMed
    1. Bundy, J. D. et al. Cardiovascular health score and lifetime risk of cardiovascular disease. Circulation: Cardiovascular Quality and Outcomes10.1161/CIRCOUTCOMES.119.006450 (2020). - PMC - PubMed
    1. Grundy SM, et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/ APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol: executive summary. Circulation. 2019;139:e1082–e1143. doi: 10.1161/CIR.0000000000000624. - DOI - PMC - PubMed

Substances