Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Dec 28:2023.12.25.23300499.
doi: 10.1101/2023.12.25.23300499.

The accuracy of polygenic score models for anthropometric traits and Type II Diabetes in the Native Hawaiian Population

Affiliations

The accuracy of polygenic score models for anthropometric traits and Type II Diabetes in the Native Hawaiian Population

Ying-Chu Lo et al. medRxiv. .

Update in

Abstract

Polygenic scores (PGS) are promising in stratifying individuals based on the genetic susceptibility to complex diseases or traits. However, the accuracy of PGS models, typically trained in European- or East Asian-ancestry populations, tend to perform poorly in other ethnic minority populations, and their accuracies have not been evaluated for Native Hawaiians. Using body mass index, height, and type-2 diabetes as examples of highly polygenic traits, we evaluated the prediction accuracies of PGS models in a large Native Hawaiian sample from the Multiethnic Cohort with up to 5,300 individuals. We evaluated both publicly available PGS models or genome-wide PGS models trained in this study using the largest available GWAS. We found evidence of lowered prediction accuracies for the PGS models in some cases, particularly for height. We also found that using the Native Hawaiian samples as an optimization cohort during training did not consistently improve PGS performance. Moreover, even the best performing PGS models among Native Hawaiians would have lowered prediction accuracy among the subset of individuals most enriched with Polynesian ancestry. Our findings indicate that factors such as admixture histories, sample size and diversity in GWAS can influence PGS performance for complex traits among Native Hawaiian samples. This study provides an initial survey of PGS performance among Native Hawaiians and exposes the current gaps and challenges associated with improving polygenic prediction models for underrepresented minority populations.

Keywords: BMI; GWAS summary statistics; Native Hawaiians; Polygenic Scores; height; type-2 diabetes.

PubMed Disclaimer

Figures

Fig 1.
Fig 1.. The overall study design of PGS evaluation in Native Hawaiians.
The GWAS summary statistics were downloaded from large consortiums and biobanks (BBJ, UKB+GIANT, BBJ+TWB, and META). Each population-specific GWAS (EAS, EUR) were used to train PGS models with the matching MEC cohort as the LD reference and optimization cohort (MEC-J for EAS, MEC-W for EUR). Multi-ancestiy meta-analysis GWAS (META) were used to train PGS for both EAS and EUR populations. In Design I, EAS- or EUR-optimized PGS were validated in held-out MEC-J, MEC-W and MEC-NH samples. Comparisons of PGS prediction accuracy between MEC-NH and MEC-J or MEC-W provide the metric for transferability. In Design II, PGS models based on EAS or EUR GWAS were using MEC-NH, and the performance in held-out MEC-NH were then compared to the corresponding metric in Design I to assess potential improvement of prediction by PGS for MEC-NH. See Table S1 for detailed descriptions of the GWAS datasets used for this study.
Fig 2.
Fig 2.. The transferability of EAS- and EUR-trained PGS for BMI, height and T2D.
The genomic PGS model with the highest prediction accuracy in optimization cohorts was validated in held-out MEC-J, MEC-W, and MEC-NH cohorts. This figure summarizes the results of analysis in Design I, Fig. 1, and details of the model parameter for the best performing PGS model can be found in Table 1. Best model based on the C+T or LDpred2 approach are represented by blank or hashed bars, respectively. The standard error for the R2 were calculated using 1,000 sets of bootstrap samples. For BMI and height, random 1,000 individuals from each of MEC-J, MEC-W, and MEC-NH were used for validation. ForT2D, all cases and controls that were not used in training were used for validation: 3,315 cases and 6,700 controls for MEC-J, 496 cases and 4,063 controls for MEC-W, 392 cases and 549 controls for MEC-NH.
Fig 3.
Fig 3.. The impact of MEC-NH as optimization cohort on PGS prediction accuracies in held-out MEC-NH for BMI, height, and T2D.
For each combination of GWAS-trait PGS models that was previously optimized in MEC-J or MEC-W in Fig. 2, the same data was then optimized using MEC-NH samples here (Design II, Fig. 1). Previously optimized PGS models and the MEC-NH-optimized models were both validated in the held-out MEC-NH cohort to evaluate if optimization in MEC-NH would improve the prediction accuracy in Native Hawaiians. Details of the model parameter for the best performing PGS model can be found in Table S2. Best model based on the C+T or LDpred2 approach are represented by blank or hashed bars, respectively. The standard error for the R2 were calculated using 1,000 sets of bootstrap samples. For BMI and height, random 1,000 MEC-NH individuals were used for validation. For T2D, 392 cases and 549 controls from MEC-NH were used for validation.
Fig 4.
Fig 4.. A Comparison of PGS between the optimal PGS from this study and PGS from the PGS-catalog.
PGS models available on the PGS catalog (URL https://www.pgscatalog.org/) as of May 18, 2022 were downloaded for BMI, height, and T2D, and validated in the same MEC-NH individuals here. Blue bars represent the PGS constructed in this study with the highest prediction accuracy in MEC-NH. Clear and hash bars indicate the PGS was derived from C+T or LDPred2 approaches. Gray bars depict PGS from the PGS catalog. For BMI and height, random 1,000 MEC-NH individuals were used for validation. For T2D, 3 92 cases and 549 controls from MEC-NH were used for validation.
Fig 5.
Fig 5.. Prediction accuracies of models from PGS catalog in the random Native Hawaiian and Native Hawaiian with highest Polynesian ancestry validation sets.
Each PGS model from PGS catalog was assessed in validation datasets from MEC-NH. For BMI and height, the validation cohort consisted of randomly selected 1000 MEC-NH or the 1000 individuals with the highest estimated Polynesian ancestry (minimum estimated Polynesian ancestry = 65%) among the entire MEC-NH cohort. This was not restricted to the 1,000 individuals reserved for validation in Figures 2–4 as none of the MEC-NH individuals were used in construction of the publicly available PGS models. For T2D, because only 768 individuals (346 cases, 422 controls) with > 65% estimated Polynesian ancestry were available, we compared to 768 randomly selected MEC-NH individuals (318 cases, 450 controls).
Fig 6.
Fig 6.. Prediction accuracies of models from this study in the random Native Hawaiian and Native Hawaiian with highest Polynesian ancestry validation sets.
The PGS models were all trained from EAS-, EUR- or multi-ancestry GWAS, but using the MEC-NH for optimization. The resulting models were then validated in 200 randomly selected MEC-NH individuals and the same number of individuals with highest estimated Polynesian ancestries (minimum estimated Polynesian ancestry = 65%). We did not perform the analysis for T2D due to too few case / control samples, particularly among those with high Polynesian ancestries.

References

    1. Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019;20: 467–484. doi:10.1038/s41576-019-0127-1 - DOI - PubMed
    1. Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y, Martin AR, et al. Genome-wide association studies. Nature Reviews Methods Primers. 2021;1: 59. doi:10.1038/s43586-021-00056-9 - DOI
    1. Igo RPJ, Kinzy TG, Cooke Bailey JN. Genetic Risk Scores. Curr Protoc Hum Genet. 2019;104: e95. doi:10.1002/cphg.95 - DOI - PMC - PubMed
    1. Goldstein BA, Yang L, Salfati E, Assimes TL. Contemporary Considerations for Constructing a Genetic Risk Score: An Empirical Approach. Genetic Epidemiology. 2015;39: 439–445. doi:10.1002/gepi.21912 - DOI - PMC - PubMed
    1. Pain O, Glanville KP, Hagenaars SP, Selzam S, Fürtjes AE, Gaspar HA, et al. Evaluation of polygenic prediction methodology within a reference-standardized framework. PLoS Genet. 2021;17: e1009021. doi:10.1371/journal.pgen.1009021 - DOI - PMC - PubMed

Publication types