. 2022 Nov;54(11):1609-1614.

doi: 10.1038/s41588-022-01200-1. Epub 2022 Oct 24.

A combined polygenic score of 21,293 rare and 22 common variants improves diabetes diagnosis based on hemoglobin A1C levels

Peter Dornbos^{1

2

3

4}, Ryan Koesterer^{1

2}, Andrew Ruttenburg^{1

2}, Trang Nguyen^{1

2}, Joanne B Cole^{1

2

5

6

7

8

9}; AMP-T2D-GENES Consortium; Aaron Leong^{1

2

5

8

9

10}, James B Meigs^{1

2

5

10}, Jose C Florez^{1

2

5

8

9}, Jerome I Rotter¹¹, Miriam S Udler^{1

2

5

8

9}, Jason Flannick^{12

13

14

15}

Affiliations

¹ Programs in Metabolism Program, Broad Institute, Cambridge, MA, USA.
² Medical and Population Genetics Program, Broad Institute, Cambridge, MA, USA.
³ Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA.
⁴ Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
⁵ Department of Medicine, Harvard Medical School, Boston, MA, USA.
⁶ Division of Endocrinology, Boston Children's Hospital, Boston, MA, USA.
⁷ Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA, USA.
⁸ Diabetes Unit, Massachusetts General Hospital, Boston, MA, USA.
⁹ Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
¹⁰ Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA.
¹¹ The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA.
¹² Programs in Metabolism Program, Broad Institute, Cambridge, MA, USA. jason.flannick@childrens.harvard.edu.
¹³ Medical and Population Genetics Program, Broad Institute, Cambridge, MA, USA. jason.flannick@childrens.harvard.edu.
¹⁴ Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA. jason.flannick@childrens.harvard.edu.
¹⁵ Department of Pediatrics, Harvard Medical School, Boston, MA, USA. jason.flannick@childrens.harvard.edu.

PMID: 36280733
PMCID: PMC9995082
DOI: 10.1038/s41588-022-01200-1

A combined polygenic score of 21,293 rare and 22 common variants improves diabetes diagnosis based on hemoglobin A1C levels

Peter Dornbos et al. Nat Genet. 2022 Nov.

. 2022 Nov;54(11):1609-1614.

doi: 10.1038/s41588-022-01200-1. Epub 2022 Oct 24.

Authors

Affiliations

¹ Programs in Metabolism Program, Broad Institute, Cambridge, MA, USA.
² Medical and Population Genetics Program, Broad Institute, Cambridge, MA, USA.
³ Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA.
⁴ Department of Pediatrics, Harvard Medical School, Boston, MA, USA.
⁵ Department of Medicine, Harvard Medical School, Boston, MA, USA.
⁶ Division of Endocrinology, Boston Children's Hospital, Boston, MA, USA.
⁷ Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA, USA.
⁸ Diabetes Unit, Massachusetts General Hospital, Boston, MA, USA.
⁹ Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
¹⁰ Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, USA.
¹¹ The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA.
¹² Programs in Metabolism Program, Broad Institute, Cambridge, MA, USA. jason.flannick@childrens.harvard.edu.
¹³ Medical and Population Genetics Program, Broad Institute, Cambridge, MA, USA. jason.flannick@childrens.harvard.edu.
¹⁴ Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA. jason.flannick@childrens.harvard.edu.
¹⁵ Department of Pediatrics, Harvard Medical School, Boston, MA, USA. jason.flannick@childrens.harvard.edu.

PMID: 36280733
PMCID: PMC9995082
DOI: 10.1038/s41588-022-01200-1

Abstract

Polygenic scores (PGSs) combine the effects of common genetic variants^1,2 to predict risk or treatment strategies for complex diseases^3-7. Adding rare variation to PGSs has largely unknown benefits and is methodically challenging. Here, we developed a method for constructing rare variant PGSs and applied it to calculate genetically modified hemoglobin A1C thresholds for type 2 diabetes (T2D) diagnosis^7-10. The resultant rare variant PGS is highly polygenic (21,293 variants across 154 genes), depends on ultra-rare variants (72.7% observed in fewer than three people) and identifies significantly more undiagnosed T2D cases than expected by chance (odds ratio = 2.71; P = 1.51 × 10^-6). A PGS combining common and rare variants is expected to identify 4.9 million misdiagnosed T2D cases in the United States-nearly 1.5-fold more than the common variant PGS alone. These results provide a method for constructing complex trait PGSs from rare variants and suggest that rare variants will augment common variants in precision medicine approaches for common disease.

PubMed Disclaimer

Figures

**Extended Data Figure 1 |. Single variant HbA1C associations.**
Manhattan plot of the single variant associations identified by our meta-analysis. Horizontal lines indicate the threshold used for exome-wide significance for coding variants (red: p≤1.8×10⁻⁸ as derived from a previous determined threshold p≤4.3×10⁻⁷ and Bonferroni correction for 24 phenotypes) and genome-wide significance for non-coding variants (green: p≤2.1×10⁻⁹ as derived from the traditional genome-wide significance threshold p≤5×10⁻⁸ and Bonferroni correction for 24 phenotypes). Single variant associations were determined via the efficient mixed-model association expedited (EMMAX) method

**Extended Data Figure 2 |. Effect sizes and proportion of variance explained for rare variant HbA1C gene-level associations.**
Results are displayed for a, *G6PD* (N=1,382 for AA; N=1,930 for EA; N=41,689 for EU; N=1,861 for SA; N=892 for HS), b, *GCK* (N=551 for EA; N=40,241 for EU; N=487 for HS), and **(c)** *PIEZO1* (N=905 for AA; N=1,340 for EA; N=42,061 for EU; N=789 for SA; N=484 for HS). We calculated effect sizes (mmol/mol) and liability variance explained separately for each ancestry and then combined these via a meta-analysis. We performed the calculations for the strongest associated gene-level mask and for the strongest associated common variant within 125kb of the gene as previously reported (N=7,564 for AA; N=20,838 for EA; N=123,665 for EU; N=8,874 for SA). Proportion of variance explained is displayed as the proportion of total liability variance. Abbreviations: AA, African-American; EA, East Asian; EU, European; HS, Hispanic; SA, South Asian; M-A, meta-analysis. Error bars indicate 95% confidence intervals.

**Extended Data Figure 3 |. Calculating and evaluating common variant polygenic scores.**
We calculated common polygenic scores based on effect sizes and results from a previously published multi-ethnic HbA1C GWAS. We calculated polygenic scores separately for each of the four ancestries in our test sample with available GWAS data, evaluated ancestry-specific odds ratios via a Fisher’s exact tests, and then combined these odds ratios via a fixed-effects meta-analysis to produce a transethnic odds ratio.

**Extended Data Figure 4 |. Enrichment analyses of HbA1C and RBC rare variant gene-level associations.**
We ranked genes by their HbA1C gene-level p-value and tested the degree to which the top n associations (with n ranging from 1 to 1,000) were enriched for red blood cell count (RBC) gene-level associations. Enrichments were calculated using a one-sided Wilcoxon rank-sum test, comparing the RBC gene-level p-values of the top n HbA1C associations to the RBC gene-level p-values of background genes matched on the number of variants and total allele count; the solid blue line in the plot shows the one-sided Wilcoxon p-values as a function of n. As a negative control, we also conducted the reciprocal analysis in which we tested the top i RBC associations for enrichment for HbA1C associations; the solid yellow shows the one-sided Wilcoxon p-values.

**Extended Data Figure 5 |. A framework for constructing polygenic scores that include rare variants.**
The framework consists of two steps: a, choosing genes to include in the polygenic score, based on their association p-value and annotation, and b, defining weights for rare variants, based on the masks that include them and the aggregate effect sizes observed for the masks. a, We explored three methods for choosing genes, based on their strength of HbA1C association (blue boxes) and evidence of acting through erythrocytic pathways (red). “GLYCEMIC set” indicates genes located within a glycemic gene set enriched (at p≤0.05) for HbA1C rare variant associations, while “RBC set” indicates genes located within an erythrocytic gene set enriched (at p≤0.05) for HbA1C rare variant associations (the specific gene sets are shown in Figure 2). “HbA1C LOCUS” and “RBC LOCUS” indicates genes located within 125kb of a common variant HbA1C or RBC association, respectively. The two negative controls included only genes that failed the erythrocytic pathway filters (“Excluded”) and applied either the HbA1C association strength filters for the loose gene set (control 1) or the association strength filters for the relaxed gene set (control 2). b, We explored three methods for weighting variants (Methods): the aggregate effect size of the strictest mask that contained the variant (nested), the aggregate effect size of variants unique to the strictest mask that contained the variant (unique), or the aggregate effect size of a weighted burden test for the gene multiplied by the specific weight of the variant (weighted).

**Extended Data Figure 6 |. Testing the accuracy of the rare variant polygenic score.**
As described in Figure 3 and Methods, for each of the nine rare variant polygenic scores (three variant weighting schemes for each of three gene set definitions; Extended Data Figure 5), we calculated Fisher’s odds ratios and 95% confidence intervals for the fraction of true T2D cases reclassified by the model as compared to the null expectation. The area of the diamond for each odds ratio is proportional to the total number of reclassified individuals in the AMP-T2D-GENES test sample (total N assessed=17,206; see Supplementary Table 9 for model-specific reclassification sample sizes). Error bars indicate 95% confidence intervals of the odds ratios.

**Extended Data Figure 7 |. Secondary analysis of rare variant polygenic scores for UKB samples only.**
To ensure that the ability of the rare variant polygenic score to reclassify an excess of true cases was not due to over-fitting, we built nine risk scores as in Extended Data Figure 6 but with genes selected from an analysis of only UKB samples (Methods). For each of the nine resulting rare variant polygenic scores, we calculated Fisher’s odds ratios and 95% confidence intervals for the fraction of true T2D cases reclassified by the model as compared to the null expectation. The area of the diamond for each odds ratio is proportional to the total number of reclassified individuals in the AMP-T2D-GENES test sample (total N assessed=17,206; see Supplementary Table 9 for model-specific reclassification sample sizes). Error bars indicate 95% confidence intervals of the odds ratios.

**Extended Data Figure 8 |. Impact of adjusting rare variant effects for common variants included in the polygenic score.**
Scatterplots indicate HbA1C gene-level effect sizes (mmol/mol) as estimated by burden tests with and without variants from the common variant PGS included as covariates in the test. **a-g,** Results are shown for each of the seven rare variant masks. We analyzed genes with nominal (p≤0.05) rare variant associations and within 125kb of a variant in the common variant PGS. Results indicate that, on average, rare variant effects remain roughly the same when adjusting for common variants. Spearman’s rank correlation coefficients (*i.e.* rho) and associated two-sided p-values are indicated on plots, as are the slopes (*i.e.* beta) and two-sided p-values from linear regression. Blue dotted lines show the linear regression slopes; red lines indicate a slope of 1.

**Extended Data Figure 9 |. Testing for heterogeneity across ancestry for variants included in common variant and rare variant polygenic scores.**
We used a Cochran’s Q test to evaluate heterogeneity across ancestry-level single-variant and gene-level association results. QQ plots are shown for p-values from a, single-variant Q tests for common variants and **b-h,** gene-level Q tests for different rare variant masks; included in each analysis were the variants (or genes) included in the corresponding polygenic score. Departures above the diagonal red line suggest heterogeneity beyond the null expectation (blue lines indicate 90% confidence intervals for the null expectation), while lambda values indicate the ratio of the median observed chi square statistic to the median of the expected chi square statistic under the null; larger lambda values indicate larger deviations from the null.

**Extended Data Figure 10 |. Fraction of variants found in enriched erythrocytic glycemic gene sets with negative effects on HbA1C levels.**
Reported is the fraction of variants with negative HbA1C effect sizes (based on the single variant meta-analysis) within genes (i) with HbA1C gene-level p≤0.05 and (ii) within a significantly enriched (p≤0.05) erythrocytic (N=4) or glycemic (N=5) gene set. **a-g**, Results are shown for variants within each mask. The bars represent the fractions observed for variants across all gene sets, while the dots represent the fractions observed for variants within each individual gene set. A two-sided t-test was used to assess potentially significant differences; p-values are shown above each plot. Error bars indicate standard error.

**Figure 1 |. Rare variant associations for HbA1C are comparatively strong.**
a, The number of exome-wide significant predicted high or moderate impact variant associations across 24 quantitative phenotypes. Single variant associations were determined using the efficient mixed-model association expedited (EMMAX) method, and gene-level associations were determined using burden testing. Yellow: P ≤ 1.8 × 10⁻⁸ (as derived from a previously determined threshold of P ≤ 4.3 × 10⁻⁷ and Bonferroni correction for 24 phenotypes) and exome-wide significant gene-level associations. Blue: P ≤ 1.0 × 10⁻⁷ (as derived from the traditional exome-wide significance threshold of P ≤ 2.5 × 10⁻⁶ and Bonferroni correction for 24 phenotypes). b, Manhattan plot of all gene-level HbA1C associations determined via burden testing. Those reaching exome-wide significance (P ≤ 1.0 × 10⁻⁷; red line) are labeled. c,d, Effect sizes (mmol/mol) for rare variant gene-level associations for *G6PD* (c) (n = 1,382 for AA; n = 1,930 for EA; n = 41,689 for EU; n = 1,861 for SA; n = 892 for HS) and *PIEZO1* (d) (n = 905 for AA; n = 1,340 for EA; n = 42,061 for EU; n = 789 for SA; n = 484 for HS). Previously reported nearby common variant associations (n = 7,564 for AA; n = 20,838 for EA; n = 123,665 for EU; n = 8,874 for SA) are shown for comparison. Gene-level effects are displayed from the strongest associated variant mask. AA, African-American; EA, East Asian; EU, European; HS, Hispanic; SA, South Asian; M-A, meta-analysis. Error bars indicate 95% confidence intervals.

**Figure 2 |. Rare variant gene-level HbA1C associations show enrichment for genes involved in glycemic control and erythrocytic pathways in mice.**
a, Boxplots displaying the percentile of gene-level P-values (relative to matched genes; grey) for genes thought to impact erythrocyte pathways (blue) and glycemic control (green) in mice. The horizontal dotted line at the 50^th percentile indicates the expected median percentile under the null distribution. We used a one-sided Wilcoxon rank sum test to assess significant deviation of percentiles from matched genes; Wilcoxon P-values are displayed for each gene set. The numbers of genes in each gene set are indicated in the Supplementary Note. b, The fraction of genes with a negative effect on HbA1C levels among those (i) with HbA1C gene-level P ≤ 0.05 and (ii) within a significantly enriched (P ≤ 0.05) erythrocytic (n = 4) or glycemic (n = 5) gene set. The bars represent the fractions of genes observed across all gene sets, while the dots represent the fractions of genes observed for each individual gene set. We used a binomial test to assess deviation from the expected fraction of 50%. In a, the box plot indicates minimum, lower quartile, median, upper quartile, and maximum. In b, the error bars indicate standard error.

**Figure 3 |. Accuracy and properties of rare and common variant polygenic scores.**
We identified “reclassified” individuals with adjusted (but not unadjusted) HbA1C above the T2D diagnostic threshold (47.53 mmol/mol) and compared (via a two-tailed Fisher’s exact test) the fraction of “true” cases among such individuals to the number expected by chance (Methods). a, From top, the forest plot shows Fisher’s exact test odds ratios and 95% confidence intervals for polygenic scores constructed from two exome-wide significant rare variant gene-level associations (*PIEZO1*/*G6PD*), the best performing (“loose, nested”) rare variant polygenic score (Erythrocytic Genes), a negative control polygenic score that excludes known erythrocytic genes (Glycemic Genes), a previously published common variant polygenic score (Common Erythrocytic Variants), and a polygenic score that combines rare and common variants (Combined Erythrocytic PGS). The area of each diamond is proportional to the number of individuals in our test sample reclassified by the score. b, Fisher’s exact test odds ratios stratified by ancestry. The area of each diamond is now proportional to the number of individuals in the US population that would be reclassified by the score after scaling the ancestral proportions in our test sample to those estimated for the US (Methods). Due to inadequate data regarding East Asian and South Asian percentages of the US population, “Asian” represents a meta-analysis of the South Asian and East Asian results; Supplementary Table 10 shows PGS performance within each ancestry. Europeans are not displayed due to insufficient data in our test sample. **c-f**, Histograms display the distribution (in mmol/mol HbA1C) of rare variant (gray) and common variant (colored) polygenic scores for each ancestry.

See this image and copyright information in PMC

Cited by

Polygenic score development in the era of large-scale biobanks.
Plagnol V. Plagnol V. Cell Genom. 2022 Jan 13;2(1):100088. doi: 10.1016/j.xgen.2021.100088. eCollection 2022 Jan 12. Cell Genom. 2022. PMID: 36777037 Free PMC article.
Bridging the diversity gap: Analytical and study design considerations for improving the accuracy of trans-ancestry genetic prediction.
Bocher O, Gilly A, Park YC, Zeggini E, Morris AP. Bocher O, et al. HGG Adv. 2023 Jun 15;4(3):100214. doi: 10.1016/j.xhgg.2023.100214. eCollection 2023 Jul 13. HGG Adv. 2023. PMID: 37448981 Free PMC article.
Integration of polygenic and gut metagenomic risk prediction for common diseases.
Liu Y, Ritchie SC, Teo SM, Ruuskanen MO, Kambur O, Zhu Q, Sanders J, Vázquez-Baeza Y, Verspoor K, Jousilahti P, Lahti L, Niiranen T, Salomaa V, Havulinna AS, Knight R, Méric G, Inouye M. Liu Y, et al. Nat Aging. 2024 Apr;4(4):584-594. doi: 10.1038/s43587-024-00590-7. Epub 2024 Mar 25. Nat Aging. 2024. PMID: 38528230 Free PMC article.
Rare loss of function variants in the hepatokine gene INHBE protect from abdominal obesity.
Deaton AM, Dubey A, Ward LD, Dornbos P, Flannick J; AMP-T2D-GENES Consortium; Yee E, Ticau S, Noetzli L, Parker MM, Hoffing RA, Willis C, Plekan ME, Holleman AM, Hinkle G, Fitzgerald K, Vaishnaw AK, Nioi P. Deaton AM, et al. Nat Commun. 2022 Jul 27;13(1):4319. doi: 10.1038/s41467-022-31757-8. Nat Commun. 2022. PMID: 35896531 Free PMC article.
Clinical use of polygenic scores in type 2 diabetes: challenges and possibilities.
Prasad RB, Hakaste L, Tuomi T. Prasad RB, et al. Diabetologia. 2025 Jul;68(7):1361-1374. doi: 10.1007/s00125-025-06419-1. Epub 2025 Apr 5. Diabetologia. 2025. PMID: 40186687 Free PMC article. Review.

See all "Cited by" articles

References

1. Vilhjálmsson BJ et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet 97, 576–592 (2015). - PMC - PubMed
1. Choi SW, Mak TS & O’Reilly PF Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc 15, 2759–2772 (2020). - PMC - PubMed
1. Khera AV et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet 50, 1219–1224 (2018). - PMC - PubMed
1. Khera AV et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596.e9 (2019). - PMC - PubMed
1. Mavaddat N et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am. J. Hum. Genet 104, 21–34 (2019). - PMC - PubMed

Methods-only references

1. Kang HM et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet 42, 348–354 (2010). - PMC - PubMed
1. Lohmueller KE et al. Whole-exome sequencing of 2,000 Danish individuals and the role of rare coding variants in type 2 diabetes. Am. J. Hum. Genet 93, 1072–1086 (2013). - PMC - PubMed
1. Williams AL et al. Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. Nature 506, 97–101 (2014). - PMC - PubMed
1. Eastwood SV et al. Algorithms for the capture and adjudication of prevalent and incident diabetes in UK Biobank. PLoS One 11, e0162388 (2016). - PMC - PubMed
1. Hindy G et al. Rare coding variants in 35 genes associate with circulating lipid levels – a multi-ancestry analysis of 170,000 exomes. Am. J. Hum. Genet 109, 81–96 (2020). - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A combined polygenic score of 21,293 rare and 22 common variants improves diabetes diagnosis based on hemoglobin A1C levels

Affiliations

A combined polygenic score of 21,293 rare and 22 common variants improves diabetes diagnosis based on hemoglobin A1C levels

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Methods-only references

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Methods-only references

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical