Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 23;8(1):651.
doi: 10.1038/s42003-025-08050-7.

The accuracy of polygenic score models for BMI and Type II diabetes in the Native Hawaiian population

Affiliations

The accuracy of polygenic score models for BMI and Type II diabetes in the Native Hawaiian population

Ying-Chu Lo et al. Commun Biol. .

Abstract

Polygenic scores (PGS) are promising in stratifying individuals based on the genetic susceptibility to complex diseases or traits. However, the accuracy of PGS models, typically trained in European- or East Asian-ancestry populations, tend to perform poorly in other ethnic minority populations and their accuracies have not been evaluated for Native Hawaiians. In particular, for body mass index (BMI) and type-2 diabetes (T2D), Polynesian-ancestry individuals such as Native Hawaiians or Samoans exhibit varied distribution from other continental populations, but are understudied, particularly in the context of PGS. Using BMI and T2D as examples of metabolic traits of importance to Polynesian populations (along with height as a comparison of a similarly highly polygenic trait), here we examine the prediction accuracies of PGS models in a large Native Hawaiian sample from the Multiethnic Cohort with up to 5300 individuals. We find evidence of lowered prediction accuracies for the PGS models in some cases, particularly for height. We also find that using the Native Hawaiian samples as an optimization cohort during training does not consistently improve PGS performance. Moreover, even the best-performing PGS models among Native Hawaiians have lowered prediction accuracy among the subset of individuals most enriched with Polynesian ancestry. Our findings indicate that factors such as admixture histories, sample size, and diversity in GWAS can influence PGS performance for complex traits among Native Hawaiian samples. This study provides an initial survey of PGS performance among Native Hawaiians and exposes the current gaps and challenges associated with improving polygenic prediction models for underrepresented minority populations.

PubMed Disclaimer

Conflict of interest statement

Competing interests: C.-Y. C. is an employee of Biogen. Biogen had no role in the research described in this publication. All other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The overall study design of PGS evaluation in Native Hawaiians.
The GWAS summary statistics were downloaded from large consortiums and biobanks (BBJ, UKB + GIANT, BBJ + TWB, and META). Each population-specific GWAS (EAS, EUR) was used to train PGS models with the matching MEC cohort as the LD reference and optimization cohort (MEC-J for EAS, MEC-W for EUR). Multi-ancestry meta-analysis GWAS (META) were used to train PGS with either EAS or EUR populations, each in turn as both LD reference and optimization cohort. In Design I, EAS- or EUR-optimized PGS were validated in held-out MEC-J, MEC-W, and MEC-NH samples. Comparisons of PGS prediction accuracy between MEC-NH and MEC-J or MEC-W provide the metric for transferability. In Design II, PGS models based on EAS or EUR GWAS used MEC-NH for optimization, and the performance in held-out MEC-NH were then compared to the corresponding metric in Design I to assess potential improvement of prediction by PGS. See Supplementary Table 2 for detailed descriptions of the GWAS datasets used for this study and Supplementary Table 3 for a tabular summary of the study design.
Fig. 2
Fig. 2. The transferability of EAS- and EUR-trained PGS for BMI, height, and T2D.
The genomic PGS model with the highest prediction accuracy in optimization cohorts was validated in held-out MEC-J, MEC-W, and MEC-NH cohorts. This figure summarizes the results of analysis in Design I, Fig.1, and details of the model parameter for the best performing PGS model can be found in Supplementary Table 4. The PGS construction method that resulted in best model is represented by circles (for C + T approach) or triangles (for LDpred2 approach). The standard errors for the R2 were calculated using 1000 sets of bootstrap samples. For BMI and height, random 1000 individuals from each of MEC-J, MEC-W, and MEC-NH were used for validation. For T2D, all cases and controls that were not used in training were used for validation: 3313 cases and 6700 controls for MEC-J, 468 cases and 3110 controls for MEC-W, and 389 cases and 549 controls for MEC-NH.
Fig. 3
Fig. 3. The impact of MEC-NH as optimization cohort on PGS prediction accuracies in held-out MEC-NH for BMI, height, and T2D.
For each combination of GWAS-trait PGS models that was previously optimized in MEC-J or MEC-W in Fig.2, the same data was then optimized using MEC-NH samples here (Design II, Fig.1). Previously optimized PGS models and the MEC-NH-optimized models were both validated in the same held-out MEC-NH cohort to evaluate if optimization in MEC-NH would improve the prediction accuracy in Native Hawaiians. Details of the model parameter for the best performing PGS model can be found in Supplementary Table 5. The PGS construction method that resulted in the best model is represented by circles (for C + T approach) or triangles (for LDpred2 approach). The standard errors for the R2 were calculated using 1000 sets of bootstrap samples. For BMI and height, a random subset of 1000 MEC-NH individuals was used for validation. For T2D, 389 cases and 549 controls from MEC-NH were used for validation.
Fig. 4
Fig. 4. A Comparison of PGS between the optimal PGS from this study and PGS from the PGS-catalog.
PGS models available on the PGS catalog (URL https://www.pgscatalog.org/) as of July 1, 2024 were downloaded for BMI, height, and T2D, and validated in the MEC-NH individuals here (the same held-out validation MEC-NH as in Figs. 2 and 3). Blue points represent the PGS constructed in this study with the highest prediction accuracy in MEC-NH. Circles and triangles indicate that the PGS was derived from C + T and LDPred2 approaches, respectively. Gray points depict PGS from the PGS catalog. Only the top 10 performing PGS models from the PGS catalog were shown; the complete data can be found in Supplementary Data 1. The standard error for the R2 was calculated using 1000 sets of bootstrap samples. For BMI and height, random 1000 MEC-NH individuals were used for validation. For T2D, 389 cases and 549 controls from MEC-NH were used for validation.
Fig. 5
Fig. 5. Prediction accuracies of models from PGS catalog in the random Native Hawaiian and Native Hawaiian with highest Polynesian ancestry validation sets.
Each PGS model from PGS catalog was assessed in validation datasets from MEC-NH, either from randomly selected individuals (white points) or individuals with highest Polynesian ancestry (yellow points). The standard errors for the R2 were calculated using 1000 sets of bootstrap samples. For BMI and height, the validation cohort consisted of randomly selected 1000 MEC-NH or the 1000 individuals with the highest estimated Polynesian ancestry among the entire MEC-NH cohort (see Methods). This was not restricted to the 1000 individuals reserved for validation in Figs. 2–4 as none of the MEC-NH individuals were used in construction of the publicly available PGS models. For T2D, because only 768 individuals could be defined as either a case or control (346 cases, 422 controls) among the 1000 individuals with highest Polynesian ancestry, we compared to 768 individuals (318 cases, 450 controls) randomly selected among all MEC-NH individuals with T2D case/control status.
Fig. 6
Fig. 6. Prediction accuracies of models from this study in the random Native Hawaiian and Native Hawaiian with highest Polynesian ancestry validation sets.
The PGS models were based on EAS-, EUR- or multi-ancestry GWAS, but using the MEC-NH for optimization. The resulting models were previously validated in 1000 randomly selected MEC-NH individuals in Fig. 3. Here, for fair comparison we validated the same models in a subset of 200 individuals out of the 1000 previously used for validation, but this time comparing 200 randomly selected individuals (white points) to 200 individuals with highest estimated Polynesian ancestries (yellow points). Circles and triangles indicate that the PGS was derived from C + T and LDPred2 approaches, respectively. We did not perform the analysis for T2D due to too few case/control samples, particularly among those with high Polynesian ancestries. The standard errors for the R2 were calculated using 1000 sets of bootstrap samples.

Update of

References

    1. Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet20, 467–484 (2019). - PubMed
    1. Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Prim.1, 59 (2021).
    1. Igo, R. P. J., Kinzy, T. G. & Cooke Bailey, J. N. Genetic risk scores. Curr. Protoc. Hum. Genet104, e95 (2019). - PMC - PubMed
    1. Goldstein, B. A., Yang, L., Salfati, E. & Assimes, T. L. Contemporary considerations for constructing a genetic risk score: an empirical approach. Genet. Epidemiol.39, 439–445 (2015). - PMC - PubMed
    1. Wang, Y. et al. Polygenic prediction across populations is influenced by ancestry, genetic architecture, and methodology. Cell Genom3, 100408 (2023). - PMC - PubMed

LinkOut - more resources