. 2024 Feb;30(2):480-487.

doi: 10.1038/s41591-024-02796-z. Epub 2024 Feb 19.

Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations

Niall J Lennon^#¹, Leah C Kottyan^#², Christopher Kachulis³, Noura S Abul-Husn⁴, Josh Arias⁵, Gillian Belbin⁴, Jennifer E Below⁶, Sonja I Berndt⁵, Wendy K Chung⁷, James J Cimino⁸, Ellen Wright Clayton⁶, John J Connolly⁹, David R Crosslin^{10

11}, Ozan Dikilitas¹², Digna R Velez Edwards⁶, QiPing Feng⁶, Marissa Fisher³, Robert R Freimuth¹², Tian Ge¹³; GIANT Consortium; All of Us Research Program; Joseph T Glessner⁹, Adam S Gordon¹⁴, Candace Patterson³, Hakon Hakonarson⁹, Maegan Harden³, Margaret Harr⁹, Joel N Hirschhorn^{3

15}, Clive Hoggart⁴, Li Hsu¹⁶, Marguerite R Irvin⁸, Gail P Jarvik¹¹, Elizabeth W Karlson¹³, Atlas Khan⁷, Amit Khera³, Krzysztof Kiryluk⁷, Iftikhar Kullo¹², Katie Larkin³, Nita Limdi⁸, Jodell E Linder⁶, Ruth J F Loos^{17

18}, Yuan Luo¹⁴, Edyta Malolepsza³, Teri A Manolio⁵, Lisa J Martin², Li McCarthy³, Elizabeth M McNally¹⁴, James B Meigs¹³, Tesfaye B Mersha², Jonathan D Mosley⁶, Anjene Musick¹⁹, Bahram Namjou², Nihal Pai³, Lorenzo L Pesce¹⁴, Ulrike Peters¹⁶, Josh F Peterson⁶, Cynthia A Prows², Megan J Puckelwartz¹⁴, Heidi L Rehm³, Dan M Roden⁶, Elisabeth A Rosenthal¹¹, Robb Rowley⁵, Konrad Teodor Sawicki¹⁴, Daniel J Schaid¹², Roelof A J Smit⁴, Johanna L Smith¹², Jordan W Smoller¹³, Minta Thomas¹⁶, Hemant Tiwari⁸, Diana M Toledo³, Nataraja Sarma Vaitinadin⁶, David Veenstra¹¹, Theresa L Walunas¹⁴, Zhe Wang⁴, Wei-Qi Wei⁶, Chunhua Weng⁷, Georgia L Wiesner⁶, Xianyong Yin²⁰, Eimear E Kenny⁴

Collaborators, Affiliations

Collaborators

Sonja Berndt, Joel Hirschhorn, Ruth Loos

Affiliations

¹ Broad Institute of MIT and Harvard, Cambridge, MA, USA. nlennon@broadinstitute.org.
² Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA.
³ Broad Institute of MIT and Harvard, Cambridge, MA, USA.
⁴ Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁵ National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
⁶ Vanderbilt University Medical Center, Nashville, TN, USA.
⁷ Columbia University, New York, NY, USA.
⁸ University of Alabama at Birmingham, Birmingham, AL, USA.
⁹ Children's Hospital of Philadelphia, Philadelphia, PA, USA.
¹⁰ Tulane University, New Orleans, LA, USA.
¹¹ University of Washington, Seattle, WA, USA.
¹² Mayo Clinic, Rochester, MI, USA.
¹³ Mass General Brigham, Boston, MA, USA.
¹⁴ Northwestern University, Evanston, IL, USA.
¹⁵ Boston Children's Hospital, Boston, MA, USA.
¹⁶ Fred Hutchinson Cancer Center, Seattle, WA, USA.
¹⁷ Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
¹⁸ The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
¹⁹ National Institutes of Health, Bethesda, MD, USA.
²⁰ Nanjing Medical University, Nanjing, China.

^# Contributed equally.

PMID: 38374346
PMCID: PMC10878968
DOI: 10.1038/s41591-024-02796-z

Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations

Niall J Lennon et al. Nat Med. 2024 Feb.

. 2024 Feb;30(2):480-487.

doi: 10.1038/s41591-024-02796-z. Epub 2024 Feb 19.

Authors

Collaborators

Sonja Berndt, Joel Hirschhorn, Ruth Loos

Affiliations

¹ Broad Institute of MIT and Harvard, Cambridge, MA, USA. nlennon@broadinstitute.org.
² Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA.
³ Broad Institute of MIT and Harvard, Cambridge, MA, USA.
⁴ Icahn School of Medicine at Mount Sinai, New York, NY, USA.
⁵ National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
⁶ Vanderbilt University Medical Center, Nashville, TN, USA.
⁷ Columbia University, New York, NY, USA.
⁸ University of Alabama at Birmingham, Birmingham, AL, USA.
⁹ Children's Hospital of Philadelphia, Philadelphia, PA, USA.
¹⁰ Tulane University, New Orleans, LA, USA.
¹¹ University of Washington, Seattle, WA, USA.
¹² Mayo Clinic, Rochester, MI, USA.
¹³ Mass General Brigham, Boston, MA, USA.
¹⁴ Northwestern University, Evanston, IL, USA.
¹⁵ Boston Children's Hospital, Boston, MA, USA.
¹⁶ Fred Hutchinson Cancer Center, Seattle, WA, USA.
¹⁷ Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
¹⁸ The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
¹⁹ National Institutes of Health, Bethesda, MD, USA.
²⁰ Nanjing Medical University, Nanjing, China.

^# Contributed equally.

PMID: 38374346
PMCID: PMC10878968
DOI: 10.1038/s41591-024-02796-z

Abstract

Polygenic risk scores (PRSs) have improved in predictive performance, but several challenges remain to be addressed before PRSs can be implemented in the clinic, including reduced predictive performance of PRSs in diverse populations, and the interpretation and communication of genetic results to both providers and patients. To address these challenges, the National Human Genome Research Institute-funded Electronic Medical Records and Genomics (eMERGE) Network has developed a framework and pipeline for return of a PRS-based genome-informed risk assessment to 25,000 diverse adults and children as part of a clinical study. From an initial list of 23 conditions, ten were selected for implementation based on PRS performance, medical actionability and potential clinical utility, including cardiometabolic diseases and cancer. Standardized metrics were considered in the selection process, with additional consideration given to strength of evidence in African and Hispanic populations. We then developed a pipeline for clinical PRS implementation (score transfer to a clinical laboratory, validation and verification of score performance), and used genetic ancestry to calibrate PRS mean and variance, utilizing genetically diverse data from 13,475 participants of the All of Us Research Program cohort to train and test model parameters. Finally, we created a framework for regulatory compliance and developed a PRS clinical report for return to providers and for inclusion in an additional genome-informed risk assessment. The initial experience from eMERGE can inform the approach needed to implement PRS-based testing in diverse clinical settings.

PubMed Disclaimer

Conflict of interest statement

N.S.A.-H. is an employee and equity holder of 23andMe; serves as a scientific advisory board member for Allelica, Inc; received personal fees from Genentech Inc, Allelica Inc, and 23andMe; received research funding from Akcea Therapeutics; and was previously employed by Regeneron Pharmaceuticals. E.E.K. received personal fees from Illumina Inc, 23andMe and Regeneron Pharmaceuticals and serves as a scientific advisory board member for Encompass Bioscience, Foresite Labs and Galateo Bio. J.N.H. has equity in Camp4 Therapeutics and has been a consultant to Amgen, AstraZeneca, Cytokinetics, PepGen, Pfizer and Tenaya Therapeutics and is the founder of Ikaika Therapeutics. J.F.P. is a paid consultant for Natera Inc. A. Khera. is an employee of Verve Therapeutics. N.L. received personal fees from Illumina Inc and is a scientific advisory board member for FYR Diagnostics. J.F.P. is a consultant for Myome. D.V. is a consultant for Illumina and has grant support from GeneDx. T.L.W. has grant funding from Gilead Sciences, Inc. The other authors declare no competing interests.

Figures

**Fig. 1. Timeline and process overview.**
a,Timeline and process for selection, evaluation, optimization, transfer, validation and implementation of the clinical PRS test pipeline. Dashed lines represent pivotal moments in the progression of the project with duration between these events indicated in months (mo) above the blue arrow. Numbers in white represent the number of conditions being examined at each stage and their fates. List of ten conditions on the right-hand side indicates the conditions that were implemented in the clinical pipeline for this study. b, Overview of the eMERGE PRS process. Participant DNA is genotyped using the Illumina Global Diversity Array, which assesses 1.8 million sites. Genotyping data are phased and imputed with a reference panel derived from the 1,000 Genomes Project. For each participant, raw PRSs are calculated for each condition (*PRS*_raw). Each participant’s genetic ancestry is algorithmically determined in the projection step. For each condition, an ancestry calibration model is applied to each participant’s z-scores based on model parameters derived from the All of Us Research Program (Calibration) and an adjusted z-score is calculated (*PRS*_adjusted). Participants whose adjusted scores cross the predefined threshold for high PRS are identified and a pdf report is generated. The report is electronically signed after data review by a clinical laboratory director and delivered to the study portal for return to the clinical sites.

**Fig. 2. Summary of the ten conditions that were implemented.**
‘High-PRS threshold’ represents the percentile that is deemed to be the cutoff for a specific condition above which a high-PRS result is reported for that condition. Odds ratios are reported as the mean odds ratios (square dot) associated with having a score above the specified threshold, compared to having a score below the specified threshold, along with 95% confidence intervals (CIs), shown in the whiskers. The number of case and control samples used to derive these odds ratios and CIs for each condition can be found in Supplementary Table 2. Note that the odds ratio for obesity is not reported here, as it will be published by the Genetic Investigation of ANthropometric Traits consortium (Smit et al., manuscript in preparation). ‘Number of SNPs’ represents the range of numbers or sites included in each score. ‘Age ranges for return’ indicates the participant ages at which a PRS is calculated for a given condition. AFIB, atrial fibrillation; BC, breast cancer; CKD, chronic kidney disease; CHD, coronary heart disease; HC, hypercholesterolemia; PC, prostate cancer; T1D, type 1 diabetes; T2D, type 2 diabetes.

**Fig. 3. Summary of the first 2,500 eMERGE participants processed through the clinical pipeline.**
a, PCA of ancestry indicating participants with a result of ‘high PRS’ for any condition (red dots) compared to participants who did not have a high PRS identified (gray dots). b, Summary of number of high-risk conditions found per participant. c, Observed numbers of high PRS called per condition compared to the expected numbers of high PRS per condition. P values are two-sided P values calculated by simulation to account for the uncertainty in the All of Us (AoU) derived ancestry calibration parameters due to the finite size of the AoU training cohort, and further adjusted for multiple hypothesis testing using the Holm–Šidák procedure. Note not all participants get scored for every condition based on age and sex at birth filters.

**Extended Data Fig. 1. Case-control PRS histograms.**
Histograms of T2D PRS scores for case and control samples in the eMERGE I-III dataset.

**Extended Data Fig. 2. Representation of the genetic ancestry admixture composition of both the Test and Training cohorts.**
The x-axis represents individuals within the cohorts and the color-coding highlights the proportion of genetic admixture observed.

**Extended Data Fig. 3. Calibrated z-scores.**
The distributions of calibrated z-scores in the test cohort when the training cohort parameters are applied.

**Extended Data Fig. 4. Hypercholesterolemia PRS calibrated z-scores of training cohort.**
Note the improvement when an ancestry dependent variance is used over a constant variance method.

**Extended Data Fig. 5. PRS z-score as a function of African Admixture Fraction, for individuals of African ancestry.**
In the ‘Bucketing’ method, a z-score is calculated by comparing to the mean and variance of all individuals of African ancestry in the cohort. The ‘PCA Calibrated’ method is the method described above. Note the dependence on admixture fraction in the ‘Bucketing’ method, which has been removed in the ‘PCA Calibrated’ method.

See this image and copyright information in PMC

Update of

Selection, optimization, and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse populations.
Lennon NJ, Kottyan LC, Kachulis C, Abul-Husn N, Arias J, Belbin G, Below JE, Berndt S, Chung W, Cimino JJ, Clayton EW, Connolly JJ, Crosslin D, Dikilitas O, Velez Edwards DR, Feng Q, Fisher M, Freimuth R, Ge T; GIANT Consortium; All of Us Research Program; Glessner JT, Gordon A, Guiducci C, Hakonarson H, Harden M, Harr M, Hirschhorn J, Hoggart C, Hsu L, Irvin R, Jarvik GP, Karlson EW, Khan A, Khera A, Kiryluk K, Kullo I, Larkin K, Limdi N, Linder JE, Loos R, Luo Y, Malolepsza E, Manolio T, Martin LJ, McCarthy L, Meigs JB, Mersha TB, Mosley J, Namjou B, Pai N, Pesce LL, Peters U, Peterson J, Prows CA, Puckelwartz MJ, Rehm H, Roden D, Rosenthal EA, Rowley R, Sawicki KT, Schaid D, Schmidlen T, Smit R, Smith J, Smoller JW, Thomas M, Tiwari H, Toledo D, Vaitinadin NS, Veenstra D, Walunas T, Wang Z, Wei WQ, Weng C, Wiesner G, Xianyong Y, Kenny E. Lennon NJ, et al. medRxiv [Preprint]. 2023 Jun 5:2023.05.25.23290535. doi: 10.1101/2023.05.25.23290535. medRxiv. 2023. Update in: Nat Med. 2024 Feb;30(2):480-487. doi: 10.1038/s41591-024-02796-z. PMID: 37333246 Free PMC article. Updated. Preprint.

References

1. Lambert SA, et al. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nat. Genet. 2021;53:420–425. doi: 10.1038/s41588-021-00783-5. - DOI - PMC - PubMed
1. Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020;12:44. doi: 10.1186/s13073-020-00742-5. - DOI - PMC - PubMed
1. Polygenic Risk Score Task Force of the International Common Disease Alliance. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat. Med. 2021;27:1876–1884. doi: 10.1038/s41591-021-01549-6. - DOI - PubMed
1. Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 2018;19:581–590. doi: 10.1038/s41576-018-0018-x. - DOI - PubMed
1. Duncan L, et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 2019;10:3328. doi: 10.1038/s41467-019-11112-0. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations

Collaborators

Affiliations

Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical