Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec;5(12):1744-1758.
doi: 10.1038/s41562-021-01119-3. Epub 2021 Jun 17.

Resource profile and user guide of the Polygenic Index Repository

Collaborators, Affiliations

Resource profile and user guide of the Polygenic Index Repository

Joel Becker et al. Nat Hum Behav. 2021 Dec.

Abstract

Polygenic indexes (PGIs) are DNA-based predictors. Their value for research in many scientific disciplines is growing rapidly. As a resource for researchers, we used a consistent methodology to construct PGIs for 47 phenotypes in 11 datasets. To maximize the PGIs' prediction accuracies, we constructed them using genome-wide association studies-some not previously published-from multiple data sources, including 23andMe and UK Biobank. We present a theoretical framework to help interpret analyses involving PGIs. A key insight is that a PGI can be understood as an unbiased but noisy measure of a latent variable we call the 'additive SNP factor'. Regressions in which the true regressor is this factor but the PGI is used as its proxy therefore suffer from errors-in-variables bias. We derive an estimator that corrects for the bias, illustrate the correction, and make a Python tool for implementing it publicly available.

PubMed Disclaimer

Conflict of interest statement

Competing interests

D.A.H., A.K., and members of the 23andMe Research Team are current or former employees of 23andMe, Inc. and hold stock or stock options in 23andMe. The authors declare no other competing interests.

Figures

Figure 1.
Figure 1.. Type of study in presentations at Behavior Genetics Association Annual Meetings.
Notes: For a description of the data underlying this figure, see Methods. Out of 1,993 presentations in total (over the 2009–2019 period), the percentages that are in exactly 0, 1, 2, or 3 categories are 26.6%, 67.6%, 5.5%, and 0.2%, respectively.
Figure 2.
Figure 2.. Algorithm determining which single-trait and multi-trait PGIs were generated for the Repository.
Notes: See Table 1 for the 36 single-trait PGIs and 35 multi-trait PGIs included in the Repository.
Figure 3.
Figure 3.. Predictive power of Repository single-trait PGIs.
Notes: Error bars show 95% confidence intervals from bootstrapping with 1,000 repetitions. Panel (A): Incremental R2 from adding Repository’s single-trait PGI to a regression of the phenotype on 10 principal components of the genetic relatedness matrix for HRS, WLS, Dunedin and E-Risk, and on 20 principal components and 106 genotyping batch dummies for UKB. Prior to the regression, phenotypes are residualized on a second-degree polynomial for age or birth year, sex, and their interactions (see Supplementary Tables 5 and 12). For the sample sizes of the GWAS that the PGIs are based on, see Supplementary Table 8. Panel (B): Difference in incremental R2 between Repository single-trait PGI and PGI constructed from publicly available summary statistics using our Repository pipeline. (Note that the latter do not include PGI directly available from datasets, such as the ones accessible from the HRS website.) If no publicly available summary statistics are available for a phenotype, then the difference in incremental R2 is equal to the incremental R2 of the single-trait PGI and is represented by an open circle. “Cigarettes per Day” in Dunedin was omitted from the Figure because the confidence interval (−5.99% to 0.94%) around the point estimate (−2.38%) required extending the y-axis substantially, making the figure hard to read. For the GWAS sample sizes of the PGIs based on publicly available summary statistics, see Supplementary Table 13.

References

    1. Visscher PM et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet 101, 5–22 (2017). - PMC - PubMed
    1. Wray NR, Goddard ME & Visscher PM Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17, 1520–1528 (2007). - PMC - PubMed
    1. Purcell SM et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009). - PMC - PubMed
    1. Rietveld CA et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science (80-. ). 340, 1467–1471 (2013). - PMC - PubMed
    1. Lee JJ et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet 50, 1112–1121 (2018). - PMC - PubMed

Publication types