This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 Dec 28:2023.12.25.23300499.

doi: 10.1101/2023.12.25.23300499.

The accuracy of polygenic score models for anthropometric traits and Type II Diabetes in the Native Hawaiian Population

Ying-Chu Lo¹, Tsz Fung Chan¹, Soyoung Jeon¹, Gertraud Maskarinec², Kekoa Taparra³, Nathan Nakatsuka⁴, Mingrui Yu^{5

6}, Chia-Yen Chen^{5

6

7

8}, Yen-Feng Lin^{6

9

10}, Lynne R Wilkens², Loic Le Marchand², Christopher A Haiman^{1

11}, Charleston W K Chiang^{1

11

12}

Affiliations

¹ Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
² Epidemiology Program, University of Hawai'i Cancer Center, University of Hawai'i, Manoa, Honolulu, HI, USA.
³ Standard Health Care, Department of Radiation Oncology, Palo Alto, CA, USA.
⁴ New York Genome Center, New York, NY, USA.
⁵ Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
⁶ Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan.
⁷ Biogen, Cambridge, MA, USA.
⁸ Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
⁹ Department of Public Health & Medical Humanities, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
¹⁰ Institute of Behavioral Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan.
¹¹ Cancer Epidemiology Program, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, USA.
¹² Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.

PMID: 38234828
PMCID: PMC10793530
DOI: 10.1101/2023.12.25.23300499

The accuracy of polygenic score models for anthropometric traits and Type II Diabetes in the Native Hawaiian Population

Ying-Chu Lo et al. medRxiv. 2023.

[Preprint]. 2023 Dec 28:2023.12.25.23300499.

doi: 10.1101/2023.12.25.23300499.

Authors

Affiliations

¹ Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
² Epidemiology Program, University of Hawai'i Cancer Center, University of Hawai'i, Manoa, Honolulu, HI, USA.
³ Standard Health Care, Department of Radiation Oncology, Palo Alto, CA, USA.
⁴ New York Genome Center, New York, NY, USA.
⁵ Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
⁶ Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan.
⁷ Biogen, Cambridge, MA, USA.
⁸ Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
⁹ Department of Public Health & Medical Humanities, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
¹⁰ Institute of Behavioral Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan.
¹¹ Cancer Epidemiology Program, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, USA.
¹² Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.

PMID: 38234828
PMCID: PMC10793530
DOI: 10.1101/2023.12.25.23300499

Update in

The accuracy of polygenic score models for BMI and Type II diabetes in the Native Hawaiian population.
Lo YC, Tian H, Chan TF, Jeon S, Alatorre K, Dinh BL, Maskarinec G, Taparra K, Nakatsuka N, Yu M, Chen CY, Lin YF, Wilkens LR, Le Marchand L, Haiman CA, Chiang CWK. Lo YC, et al. Commun Biol. 2025 Apr 23;8(1):651. doi: 10.1038/s42003-025-08050-7. Commun Biol. 2025. PMID: 40269120 Free PMC article.

Abstract

Polygenic scores (PGS) are promising in stratifying individuals based on the genetic susceptibility to complex diseases or traits. However, the accuracy of PGS models, typically trained in European- or East Asian-ancestry populations, tend to perform poorly in other ethnic minority populations, and their accuracies have not been evaluated for Native Hawaiians. Using body mass index, height, and type-2 diabetes as examples of highly polygenic traits, we evaluated the prediction accuracies of PGS models in a large Native Hawaiian sample from the Multiethnic Cohort with up to 5,300 individuals. We evaluated both publicly available PGS models or genome-wide PGS models trained in this study using the largest available GWAS. We found evidence of lowered prediction accuracies for the PGS models in some cases, particularly for height. We also found that using the Native Hawaiian samples as an optimization cohort during training did not consistently improve PGS performance. Moreover, even the best performing PGS models among Native Hawaiians would have lowered prediction accuracy among the subset of individuals most enriched with Polynesian ancestry. Our findings indicate that factors such as admixture histories, sample size and diversity in GWAS can influence PGS performance for complex traits among Native Hawaiian samples. This study provides an initial survey of PGS performance among Native Hawaiians and exposes the current gaps and challenges associated with improving polygenic prediction models for underrepresented minority populations.

Keywords: BMI; GWAS summary statistics; Native Hawaiians; Polygenic Scores; height; type-2 diabetes.

PubMed Disclaimer

Figures

**Fig 1.. The overall study design of PGS evaluation in Native Hawaiians.**
The GWAS summary statistics were downloaded from large consortiums and biobanks (BBJ, UKB+GIANT, BBJ+TWB, and META). Each population-specific GWAS (EAS, EUR) were used to train PGS models with the matching MEC cohort as the LD reference and optimization cohort (MEC-J for EAS, MEC-W for EUR). Multi-ancestiy meta-analysis GWAS (META) were used to train PGS for both EAS and EUR populations. In Design I, EAS- or EUR-optimized PGS were validated in held-out MEC-J, MEC-W and MEC-NH samples. Comparisons of PGS prediction accuracy between MEC-NH and MEC-J or MEC-W provide the metric for transferability. In Design II, PGS models based on EAS or EUR GWAS were using MEC-NH, and the performance in held-out MEC-NH were then compared to the corresponding metric in Design I to assess potential improvement of prediction by PGS for MEC-NH. See Table S1 for detailed descriptions of the GWAS datasets used for this study.

**Fig 2.. The transferability of EAS- and EUR-trained PGS for BMI, height and T2D.**
The genomic PGS model with the highest prediction accuracy in optimization cohorts was validated in held-out MEC-J, MEC-W, and MEC-NH cohorts. This figure summarizes the results of analysis in Design I, Fig. 1, and details of the model parameter for the best performing PGS model can be found in Table 1. Best model based on the C+T or LDpred2 approach are represented by blank or hashed bars, respectively. The standard error for the R² were calculated using 1,000 sets of bootstrap samples. For BMI and height, random 1,000 individuals from each of MEC-J, MEC-W, and MEC-NH were used for validation. ForT2D, all cases and controls that were not used in training were used for validation: 3,315 cases and 6,700 controls for MEC-J, 496 cases and 4,063 controls for MEC-W, 392 cases and 549 controls for MEC-NH.

**Fig 3.. The impact of MEC-NH as optimization cohort on PGS prediction accuracies in held-out MEC-NH for BMI, height, and T2D.**
For each combination of GWAS-trait PGS models that was previously optimized in MEC-J or MEC-W in Fig. 2, the same data was then optimized using MEC-NH samples here (Design II, Fig. 1). Previously optimized PGS models and the MEC-NH-optimized models were both validated in the held-out MEC-NH cohort to evaluate if optimization in MEC-NH would improve the prediction accuracy in Native Hawaiians. Details of the model parameter for the best performing PGS model can be found in Table S2. Best model based on the C+T or LDpred2 approach are represented by blank or hashed bars, respectively. The standard error for the R² were calculated using 1,000 sets of bootstrap samples. For BMI and height, random 1,000 MEC-NH individuals were used for validation. For T2D, 392 cases and 549 controls from MEC-NH were used for validation.

**Fig 4.. A Comparison of PGS between the optimal PGS from this study and PGS from the PGS-catalog.**
PGS models available on the PGS catalog (URL https://www.pgscatalog.org/) as of May 18, 2022 were downloaded for BMI, height, and T2D, and validated in the same MEC-NH individuals here. Blue bars represent the PGS constructed in this study with the highest prediction accuracy in MEC-NH. Clear and hash bars indicate the PGS was derived from C+T or LDPred2 approaches. Gray bars depict PGS from the PGS catalog. For BMI and height, random 1,000 MEC-NH individuals were used for validation. For T2D, 3 92 cases and 549 controls from MEC-NH were used for validation.

**Fig 5.. Prediction accuracies of models from PGS catalog in the random Native Hawaiian and Native Hawaiian with highest Polynesian ancestry validation sets.**
Each PGS model from PGS catalog was assessed in validation datasets from MEC-NH. For BMI and height, the validation cohort consisted of randomly selected 1000 MEC-NH or the 1000 individuals with the highest estimated Polynesian ancestry (minimum estimated Polynesian ancestry = 65%) among the entire MEC-NH cohort. This was not restricted to the 1,000 individuals reserved for validation in Figures 2–4 as none of the MEC-NH individuals were used in construction of the publicly available PGS models. For T2D, because only 768 individuals (346 cases, 422 controls) with > 65% estimated Polynesian ancestry were available, we compared to 768 randomly selected MEC-NH individuals (318 cases, 450 controls).

**Fig 6.. Prediction accuracies of models from this study in the random Native Hawaiian and Native Hawaiian with highest Polynesian ancestry validation sets.**
The PGS models were all trained from EAS-, EUR- or multi-ancestry GWAS, but using the MEC-NH for optimization. The resulting models were then validated in 200 randomly selected MEC-NH individuals and the same number of individuals with highest estimated Polynesian ancestries (minimum estimated Polynesian ancestry = 65%). We did not perform the analysis for T2D due to too few case / control samples, particularly among those with high Polynesian ancestries.

See this image and copyright information in PMC

References

1. Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019;20: 467–484. doi:10.1038/s41576-019-0127-1 - DOI - PubMed
1. Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR, et al. Genome-wide association studies. Nature Reviews Methods Primers. 2021;1: 59. doi:10.1038/s43586-021-00056-9 - DOI
1. Igo RPJ, Kinzy TG, Cooke Bailey JN. Genetic Risk Scores. Curr Protoc Hum Genet. 2019;104: e95. doi:10.1002/cphg.95 - DOI - PMC - PubMed
1. Goldstein BA, Yang L, Salfati E, Assimes TL. Contemporary Considerations for Constructing a Genetic Risk Score: An Empirical Approach. Genetic Epidemiology. 2015;39: 439–445. doi:10.1002/gepi.21912 - DOI - PMC - PubMed
1. Pain O, Glanville KP, Hagenaars SP, Selzam S, Fürtjes AE, Gaspar HA, et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 2021;17: e1009021. doi:10.1371/journal.pgen.1009021 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

The accuracy of polygenic score models for anthropometric traits and Type II Diabetes in the Native Hawaiian Population

Affiliations

The accuracy of polygenic score models for anthropometric traits and Type II Diabetes in the Native Hawaiian Population

Authors

Affiliations

Update in

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources