Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 28;12(1):18173.
doi: 10.1038/s41598-022-22637-8.

Polygenic Health Index, General Health, and Pleiotropy: Sibling Analysis and Disease Risk Reduction

Affiliations

Polygenic Health Index, General Health, and Pleiotropy: Sibling Analysis and Disease Risk Reduction

Erik Widen et al. Sci Rep. .

Abstract

We construct a polygenic health index as a weighted sum of polygenic risk scores for 20 major disease conditions, including, e.g., coronary artery disease, type 1 and 2 diabetes, schizophrenia, etc. Individual weights are determined by population-level estimates of impact on life expectancy. We validate this index in odds ratios and selection experiments using unrelated individuals and siblings (pairs and trios) from the UK Biobank. Individuals with higher index scores have decreased disease risk across almost all 20 diseases (no significant risk increases), and longer calculated life expectancy. When estimated Disability Adjusted Life Years (DALYs) are used as the performance metric, the gain from selection among ten individuals (highest index score vs average) is found to be roughly 4 DALYs. We find no statistical evidence for antagonistic trade-offs in risk reduction across these diseases. Correlations between genetic disease risks are found to be mostly positive and generally mild. These results have important implications for public health and also for fundamental issues such as pleiotropy and genetic architecture of human disease conditions.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing interests: SH is a founder, shareholder, and serves on the Board of Directors of Genomic Prediction, Inc. (GP). LT is a founder, shareholder, serves on the Board of Directors, and is the CEO of GP. EW and LL are employees and shareholders of GP. TR declares no competing interests.

Figures

Figure 1
Figure 1
The selection experiments.The test set is scored with the health index I or a single PRS and is randomly divided into groups of equal size. The individual with the best score in each group is selected and the health status among the selected are then compared with the general test set. The symbols in Eq. (4) refer to indicated subsets.
Figure 2
Figure 2
The estimated gain from index selection is a clearly positive function of group size, both using disease weights as defined by population estimates of lost life years or using disease weights based on disabilty-adjusted life years (DALY). Left: the index gain, as measured as the average health index difference between selected and random individuals (ΔIc in Eq. (4)), is growing monotonically with group size and with a continued clear positive derivative at group sizes of 10. Notably, there is a strongly significant gain for all group sizes, even at a group size of 2. The error band is a 95% CI as computed by 25 experiments with independent selection groupings. Right: while still selecting on the same index Eq. (1), we evaluated it on a case/control status metric using DALY-weights, taking quality of life into account. Again, there is a clear and steady gain, with the gain at a group size at 10 reaching about 4 years. The error band is a 95% CI as computed by 25 experiments with independent selection groupings.
Figure 3
Figure 3
Selecting on health index among five randomly grouped individuals reduces simultaneously the risk of almost all the studied diseases. Left: the RRR among the selected individuals as compared to random selection is dominantly positive, ranging from a few risks reductions statistically consistent with zero up to more than 40%. No disease risk is demonstrably increased. The case numbers for each disease are printed just above the x-axis and the error bars are 95% CI estimates from 25 repeated experiments with different selection groupings. Right: the estimated index gain for each of the index components (diseases), i.e., the disease component breakdown of Eq. (4), also shows non-negative gains across the board with most component gains being statistically significant. The unit on the y-axis is estimated life years (LY), as is the unit of Ic. This index is primarily driven by CAD, heart attack, hypertension, major depressive disorder, obesity and type II diabetes, due to their combinations of strong impacts ld and high population prevalence.
Figure 4
Figure 4
RRR comparison between selection on index and selecting on individual disease PRS.The individual disease RRR obtained by index selection contrasted with selection directly on the individual PRS, using a group size of 5. The case numbers in the test set for each disease are shown above the x-axis and the error bars are 95% CI as computed by 25 independent experiment runs.
Figure 5
Figure 5
The disease risk reduction from index and PRS selection for different group sizes.The relative performance between index selection and PRS for individual diseases varies, as seen in Fig. 4. Here shown as functions of the group size, we see the strongest performance step between having no selection (group size 1) and selecting between between two and also the continued, but less dramatic, benefits with larger group sizes. Notably, for the chosen examples type II diabetes and CAD, the full health index consistently perform as well as selecting directly on the specific PRS, showing no reduced effects on these disease from taking all the other into account. The index performance for Alzheimer’s disease and obesity, while not achieving the full risk reduction of their corresponding PRS, retain significant risk reductions for all group sizes. The error bars represent estimated 95% CI as computed by 25 selection experiments using different selection groupings.
Figure 6
Figure 6
Prevalence in health index quantile bins for the most common diseases.We binned the test set according the health index into 25 equally distributed quantiles and plot the prevalence within each bin for the most prevalent diseases (allowing enough cases for the bin resolution to be meaningful). The general population prevalences are plotted as dotted reference lines (dividing with this number would give odds ratio plots) and the y-axis start at 0 to give a visual representation of the (odds) scales. For the intermediately risk reduced diseases (according to RRR Fig. 3) hypercholesterolemia, hypertension and obesity, there is a clear and systematic risk relationship across the entire range of the health index. For asthma, there is only a weak, detectable trend for the center values consistent with its existing but smaller RRR. The error bars are 95% CI estimates obtained through 100-fold bootstrap calculations of the prevalence within each bin (no re-binning was done).
Figure 7
Figure 7
Index selection between 22,667 pairs of genetic siblings retain the overall benefits. In both figures, selection experiments among pairs of genetic siblings are compared to selection among pairs of unrelated individuals. The index performances are qualitatively very similar despite that siblings share half their genomes and have more similar environments. As expected, we do see a general performance attenuation among siblings, but also a few exceptions. Left: the RRR for each disease. The error bars for siblings are theoretical 95% C.I. using Wilson score interval for the prevalences among the selected siblings. The error bars for the selection among the unrelated pairs are again estimated 95% CI from 25 separate runs. The case numbers are shown above the x-axis. Right: the component-wise index gain for the selections among pairs of siblings and among pairs of unrelated individuals. The sibling results are presented without error bars since no theoretic uncertainty was calculated; statistical significance is therefore not established from this data. The error bars for the selection among unrelated individuals are 95% CI from 25 separate runs.
Figure 8
Figure 8
Phenotype dependencies and PRS correlation comparisons. This figure visualizes three different quantities for each pair of diseases: the PRS correlation, a comorbidity metric, and a χ2 independence test p-value. Each tile below the diagonal is split into two halves where upper blue triangle = PRS corr. is the correlation between the two diseases’ PRS, i.e., the genetic correlations as inferred by the predictors. The other half, lower green triangle =χ2 ratio, is a metric of the actual disease comorbidity: how many more times is disease coincidence observed compared to what would be expected if the diseases were completely independent, where a positive (negative) sign indicates higher (lower) comorbid frequency (this is based on the ratio between the observed and expected case-case cell in a χ2-test contingency table, hence referred to as the χ2 ratio). The green/red squares =log(p), above the diagonal indicate the statistical significance of the dependence: the (signed) logarithm of the p-value in a χ2-test. The sign is positive (negative) for more (less) frequent coincidence. Both the p-value and the χ2 ratios are masked for disease pairs without statistically significant (p=.05) dependence. For example, the deep green square above the diagonal at (CAD,HCL) indicates that the CAD-hypercholesterolemia comorbidity is highly significant (we can reject phenotype independence at p-value <10-4). Below the diagonal, we see for the same disease pair that the lower triangle is gently blue-green, i.e., case coincidence for CAD-hypercholesterolemia is about 2.3 times more common than random chance. Lastly, the upper triangle is dark blue meaning that the PRS correlation between CAD and hypercholesterolemia is among the very strongest, at about 0.22. Overall, we see that most disease pairs have statistically significant comorbidity with 1–2 times more coincidence than chance, and that their PRS are not, or slightly positively, correlated. This phenotypic and genetic background not only allows but facilitates the construction of a useful health index. The most prominent outliers are discussed in the main text.

Similar articles

Cited by

References

    1. Lewis CM, Vassos E. Polygenic risk scores: From research tools to clinical instruments. Genome Med. 2020;12:1–11. - PMC - PubMed
    1. Lewis AC, Green RC. Polygenic risk scores in the clinic: New perspectives needed on familiar ethical issues. Genome Med. 2021;13:1–10. - PMC - PubMed
    1. Richardson, T. G., Harrison, S., Hemani, G. & Smith, G. D. An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome. eLife8, e43657 (2019). - PMC - PubMed
    1. Wray, N. R. et al. From basic science to clinical application of polygenic risk scores: A primer. JAMA Psychiatry. https://doi.org/10.1001/jamapsychiatry.2020.3049 (2020) - PubMed
    1. Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 2018;19:581. - PubMed