Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 25;7(1):50.
doi: 10.1038/s41525-022-00320-1.

Methylation risk scores are associated with a collection of phenotypes within electronic health record systems

Affiliations

Methylation risk scores are associated with a collection of phenotypes within electronic health record systems

Mike Thompson et al. NPJ Genom Med. .

Abstract

Inference of clinical phenotypes is a fundamental task in precision medicine, and has therefore been heavily investigated in recent years in the context of electronic health records (EHR) using a large arsenal of machine learning techniques, as well as in the context of genetics using polygenic risk scores (PRS). In this work, we considered the epigenetic analog of PRS, methylation risk scores (MRS), a linear combination of methylation states. We measured methylation across a large cohort (n = 831) of diverse samples in the UCLA Health biobank, for which both genetic and complete EHR data are available. We constructed MRS for 607 phenotypes spanning diagnoses, clinical lab tests, and medication prescriptions. When added to a baseline set of predictive features, MRS significantly improved the imputation of 139 outcomes, whereas the PRS improved only 22 (median improvement for methylation 10.74%, 141.52%, and 15.46% in medications, labs, and diagnosis codes, respectively, whereas genotypes only improved the labs at a median increase of 18.42%). We added significant MRS to state-of-the-art EHR imputation methods that leverage the entire set of medical records, and found that including MRS as a medical feature in the algorithm significantly improves EHR imputation in 37% of lab tests examined (median R2 increase 47.6%). Finally, we replicated several MRS in multiple external studies of methylation (minimum p-value of 2.72 × 10-7) and replicated 22 of 30 tested MRS internally in two separate cohorts of different ethnicity. Our publicly available results and weights show promise for methylation risk scores as clinical and scientific tools.

PubMed Disclaimer

Conflict of interest statement

I.H. is the president of Clarity Healthcare Analytics Inc, a company that assists hospitals with extracting and using data from their electronic medical records. The company currently owns the rights to the PDW software that was used to extract data from the electronic health record. I.H. receives research funding from Merck Pharmaceuticals. M.C. is a consultant for Edwards Lifesciences (Irvine, CA) and Masimo Corp (Irvine, CA), and has funded research from Edwards Lifesciences and Masimo Corp. He is also the founder of Sironis and he owns patents and receives royalties for closed loop hemodynamic management technologies that have been licensed to Edwards Lifesciences. E.H. is senior vice president of AI/ML at OptumLabs (Minnetonka, MN). The other authors declare no competing interests concerning this article.

Figures

Fig. 1
Fig. 1. MRS increases imputation accuracy on a variety of outcomes.
ac The performance of the PRS (blue) and MRS (green) imputations on the y-axis with the baseline model performance on the x-axis. The performance of binary phenotypes (Phecodes (a), medications (b)) is measured using area under the ROC curve (AUC) and the performance of continuous phenotypes (lab results (c)) is measured using proportion of variance explained (R2). Shown is the performance on the union of outcomes that were significantly improved over the baseline model by either the MRS or PRS and that were significantly imputed by their corresponding predictor (72 Phecodes, 59 medications, and 31 labs). df The disease incidence as a function of the PRS (blue) and MRS (green) binned by deciles (d, e); and the observed Urea Nitrogen lab result value plotted against its imputed value (f).
Fig. 2
Fig. 2. Improvement in lab result imputation performance by including MRS.
For lab results that were significantly better imputed using a matrix completion imputation procedure that included the MRS values, we compare the quality of the imputed values (R2) using only the EHR data (SoftImpute) to the values generated when including the MRS values in addition to the EHR data (SoftImpute+MRS).
Fig. 3
Fig. 3. Imputation accuracy may improve with additional samples.
We downsampled the number of individuals to evaluate the imputation performance as a function of sample size using a well-imputed Phecode (a), medication (b), and lab value (c). The performance is significantly affected by the number of individuals, suggesting that there is additional power to be gained with the addition of more methylation samples. Error bars indicate 95% confidence intervals.
Fig. 4
Fig. 4. Labs as imputed by methylation, genotypes, and an externally-trained polygenic risk score.
The cross-validated R2 between the true and imputed lab value on 541 unrelated patients of non-Hispanic-Latino white-identifying individuals using a baseline predictor as well as a baseline predictor with methylation, genotypes, and a PRS externally-trained from UKBiobank summary statistics. HDL corresponds to high-density lipoprotein cholesterol and HGBA1C to glycated hemoglobin. Error bars indicate 95% confidence intervals.
Fig. 5
Fig. 5. Best methylation-imputed Phecodes within ancestral populations.
After training a model on the entire heterogeneous population of individuals, we evaluated the predictive performance within each population separately. We observed only 6 (of 60) significant differences between self-reported ancestral groupings. Error bars indicate 95% confidence intervals.

References

    1. Sudlow C, et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. - DOI - PMC - PubMed
    1. McCarty CA, Wilke RA, Giampietro PF, Wesbrook SD, Caldwell MD. Marshfield Clinic Personalized Medicine Research Project (PMRP): design, methods and recruitment for a large population-based biobank. Per. Med. 2005;2:49–79. doi: 10.1517/17410541.2.1.49. - DOI - PubMed
    1. Roden DM, et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther. 2008;84:362–369. doi: 10.1038/clpt.2008.89. - DOI - PMC - PubMed
    1. Bastarache L, et al. Phenotype risk scores identify patients with unrecognized Mendelian disease patterns. Science. 2018;359:1233–1239. doi: 10.1126/science.aal4043. - DOI - PMC - PubMed
    1. Hulsen T, et al. From big data to precision medicine. Front. Med. 2019;6:34. doi: 10.3389/fmed.2019.00034. - DOI - PMC - PubMed