. 2022 Apr;54(4):437-449.

doi: 10.1038/s41588-022-01016-z. Epub 2022 Mar 31.

Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals

Aysu Okbay^#¹, Yeda Wu², Nancy Wang³, Hariharan Jayashankar³, Michael Bennett³, Seyed Moeen Nehzati⁴, Julia Sidorenko², Hyeokmoon Kweon⁵, Grant Goldman³, Tamara Gjorgjieva³, Yunxuan Jiang⁶, Barry Hicks⁶, Chao Tian⁶, David A Hinds⁶, Rafael Ahlskog⁷, Patrik K E Magnusson⁸, Sven Oskarsson⁷, Caroline Hayward⁹, Archie Campbell^{10

11}, David J Porteous^{10

11

12}, Jeremy Freese¹³, Pamela Herd¹⁴; 23andMe Research Team; Social Science Genetic Association Consortium; Chelsea Watson⁴, Jonathan Jala⁴, Dalton Conley¹⁵, Philipp D Koellinger^{5

16}, Magnus Johannesson¹⁷, David Laibson¹⁸, Michelle N Meyer¹⁹, James J Lee²⁰, Augustine Kong²¹, Loic Yengo², David Cesarini^{3

22

23}, Patrick Turley^{24

25}, Peter M Visscher²⁶, Jonathan P Beauchamp²⁷, Daniel J Benjamin^{28

29

30}, Alexander I Young^#^{31

32}

Collaborators, Affiliations

Collaborators

Michelle Agee, Babak Alipanahi, Adam Auton, Robert K Bell, Katarzyna Bryc, Sarah L Elson, Pierre Fontanillas, Nicholas A Furlotte, David A Hinds, Karen E Huber, Aaron Kleinman, Nadia K Litterman, Jennifer C McCreight, Matthew H McIntyre, Joanna L Mountain, Carrie A M Northover, Steven J Pitts, J Fah Sathirapongsasuti, Olga V Sazonova, Janie F Shelton, Suyash Shringarpure, Joyce Y Tung, Vladimir Vacic, Catherine H Wilson, Mark Alan Fontana, Tune H Pers, Cornelius A Rietveld, Guo-Bo Chen, Valur Emilsson, S Fleur W Meddens, Joseph K Pickrell, Kevin Thom, Pascal Timshel, Ronald de Vlaming, Abdel Abdellaoui, Tarunveer S Ahluwalia, Jonas Bacelis, Clemens Baumbach, Gyda Bjornsdottir, Johannes H Brandsma, Maria Pina Concas, Jaime Derringer, Tessel E Galesloot, Giorgia Girotto, Richa Gupta, Leanne M Hall, Sarah E Harris, Edith Hofer, Momoko Horikoshi, Jennifer E Huffman, Kadri Kaasik, Ioanna P Kalafati, Robert Karlsson, Jari Lahti, Sven J van der Lee, Christiaan de Leeuw, Penelope A Lind, Karl-Oskar Lindgren, Tian Liu, Massimo Mangino, Jonathan Marten, Evelin Mihailov, Michael B Miller, Peter J van der Most, Christopher Oldmeadow, Antony Payton, Natalia Pervjakova, Wouter J Peyrot, Yong Qian, Olli Raitakari, Rico Rueedi, Erika Salvi, Börge Schmidt, Katharina E Schraut, Jianxin Shi, Albert V Smith, Raymond A Poot, Beate St Pourcain, Alexander Teumer, Gudmar Thorleifsson, Niek Verweij, Dragana Vuckovic, Juergen Wellmann, Harm-Jan Westra, Jingyun Yang, Wei Zhao, Zhihong Zhu, Behrooz Z Alizadeh, Najaf Amin, Andrew Bakshi, Sebastian E Baumeister, Ginevra Biino, Klaus Bønnelykke, Patricia A Boyle, Harry Campbell, Francesco P Cappuccio, Gail Davies, Jan-Emmanuel De Neve, Panos Deloukas, Ilja Demuth, Jun Ding, Peter Eibich, Lewin Eisele, Niina Eklund, David M Evans, Jessica D Faul, Mary F Feitosa, Andreas J Forstner, Ilaria Gandin, Bjarni Gunnarsson, Bjarni V Halldórsson, Tamara B Harris, Andrew C Heath, Lynne J Hocking, Elizabeth G Holliday, Georg Homuth, Michael A Horan, Jouke-Jan Hottenga, Philip L de Jager, Peter K Joshi, Astanand Jugessur, Marika A Kaakinen, Mika Kähönen, Stavroula Kanoni, Liisa Keltigangas-Järvinen, Lambertus A L M Kiemeney, Ivana Kolcic, Seppo Koskinen, Aldi T Kraja, Martin Kroh, Zoltan Kutalik, Antti Latvala, Lenore J Launer, Maël P Lebreton, Douglas F Levinson, Paul Lichtenstein, Peter Lichtner, David C M Liewald, Anu Loukola, Pamela A Madden, Reedik Mägi, Tomi Mäki-Opas, Riccardo E Marioni, Pedro Marques-Vidal, Gerardus A Meddens, George McMahon, Christa Meisinger, Thomas Meitinger, Yusplitri Milaneschi, Lili Milani, Grant W Montgomery, Ronny Myhre, Christopher P Nelson, Dale R Nyholt, William E R Ollier, Aarno Palotie, Lavinia Paternoster, Nancy L Pedersen, Katja E Petrovic, Katri Räikkönen, Susan M Ring, Antonietta Robino, Olga Rostapshova, Igor Rudan, Aldo Rustichini, Veikko Salomaa, Alan R Sanders, Antti-Pekka Sarin, Helena Schmidt, Rodney J Scott, Blair H Smith, Jennifer A Smith, Jan A Staessen, Elisabeth Steinhagen-Thiessen, Konstantin Strauch, Antonio Terracciano, Martin D Tobin, Sheila Ulivi, Simona Vaccargiu, Lydia Quaye, Frank J A van Rooij, Cristina Venturini, Anna A E Vinkhuyzen, Uwe Völker, Henry Völzke, Judith M Vonk, Diego Vozzi, Johannes Waage, Erin B Ware, Gonneke Willemsen, John R Attia, David A Bennett, Klaus Berger, Lars Bertram, Hans Bisgaard, Dorret I Boomsma, Ingrid B Borecki, Ute Bültmann, Christopher F Chabris, Francesco Cucca, Daniele Cusi, Ian J Deary, George V Dedoussis, Cornelia M van Duijn, Johan G Eriksson, Barbara Franke, Lude Franke, Paolo Gasparini, Pablo V Gejman, Christian Gieger, Hans-Jörgen Grabe, Jacob Gratten, Patrick J F Groenen, Vilmundur Gudnason, Pim van der Harst, Wolfgang Hoffmann, Elina Hyppönen, William G Iacono, Bo Jacobsson, Marjo-Riitta Järvelin, Karl-Heinz Jöckel, Jaakko Kaprio, Sharon L R Kardia, Terho Lehtimäki, Steven F Lehrer, Nicholas G Martin, Matt McGue, Andres Metspalu, Neil Pendleton, Brenda W J H Penninx, Markus Perola, Nicola Pirastu, Mario Pirastu, Ozren Polasek, Danielle Posthuma, Christine Power, Michael A Province, Nilesh J Samani, David Schlessinger, Reinhold Schmidt, Thorkild I A Sørensen, Tim D Spector, Kari Stefansson, Unnur Thorsteinsdottir, A Roy Thurik, Nicholas J Timpson, Henning Tiemeier, André G Uitterlinden, Veronique Vitart, Peter Vollenweider, David R Weir, James F Wilson, Alan F Wright, Dalton C Conley, Robert F Krueger, George Davey Smith, Albert Hofman, David I Laibson, Sarah E Medland, Jian Yang, Tõnu Esko

Affiliations

¹ Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands. a.okbay@vu.nl.
² Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia.
³ National Bureau of Economic Research, Cambridge, MA, USA.
⁴ UCLA Anderson School of Management, Los Angeles, CA, USA.
⁵ Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.
⁶ 23andMe, Inc., Sunnyvale, CA, USA.
⁷ Department of Government, Uppsala University, Uppsala, Sweden.
⁸ Swedish Twin Registry, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
⁹ MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK.
¹⁰ Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK.
¹¹ Usher Institute, University of Edinburgh, Edinburgh, UK.
¹² Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK.
¹³ Department of Sociology, Stanford University, Stanford, CA, USA.
¹⁴ McCourt School of Public Policy, Georgetown University, Washington, DC, USA.
¹⁵ Department of Sociology, Princeton University, Princeton, NJ, USA.
¹⁶ Robert M. La Follette School of Public Affairs, University of Wisconsin-Madison, Madison, WI, USA.
¹⁷ Department of Economics, Stockholm School of Economics, Stockholm, Sweden.
¹⁸ Department of Economics, Harvard University, Cambridge, MA, USA.
¹⁹ Center for Translational Bioethics and Health Care Policy, Geisinger Health System, Danville, PA, USA.
²⁰ Department of Psychology, University of Minnesota Twin Cities, Minneapolis, MN, USA.
²¹ Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
²² Department of Economics, New York University, New York, NY, USA.
²³ Center for Experimental Social Science, New York University, New York, NY, USA.
²⁴ Department of Economics, University of Southern California, Los Angeles, CA, USA.
²⁵ Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA.
²⁶ Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia. peter.visscher@uq.edu.au.
²⁷ Interdisciplinary Center for Economic Science and Department of Economics, George Mason University, Fairfax, VA, USA.
²⁸ National Bureau of Economic Research, Cambridge, MA, USA. daniel.benjamin@gmail.com.
²⁹ UCLA Anderson School of Management, Los Angeles, CA, USA. daniel.benjamin@gmail.com.
³⁰ Human Genetics Department, UCLA David Geffen School of Medicine, Los Angeles, CA, USA. daniel.benjamin@gmail.com.
³¹ UCLA Anderson School of Management, Los Angeles, CA, USA. alextisyoung@gmail.com.
³² Human Genetics Department, UCLA David Geffen School of Medicine, Los Angeles, CA, USA. alextisyoung@gmail.com.

^# Contributed equally.

PMID: 35361970
PMCID: PMC9005349
DOI: 10.1038/s41588-022-01016-z

Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals

Aysu Okbay et al. Nat Genet. 2022 Apr.

. 2022 Apr;54(4):437-449.

doi: 10.1038/s41588-022-01016-z. Epub 2022 Mar 31.

Authors

Collaborators

Michelle Agee, Babak Alipanahi, Adam Auton, Robert K Bell, Katarzyna Bryc, Sarah L Elson, Pierre Fontanillas, Nicholas A Furlotte, David A Hinds, Karen E Huber, Aaron Kleinman, Nadia K Litterman, Jennifer C McCreight, Matthew H McIntyre, Joanna L Mountain, Carrie A M Northover, Steven J Pitts, J Fah Sathirapongsasuti, Olga V Sazonova, Janie F Shelton, Suyash Shringarpure, Joyce Y Tung, Vladimir Vacic, Catherine H Wilson, Mark Alan Fontana, Tune H Pers, Cornelius A Rietveld, Guo-Bo Chen, Valur Emilsson, S Fleur W Meddens, Joseph K Pickrell, Kevin Thom, Pascal Timshel, Ronald de Vlaming, Abdel Abdellaoui, Tarunveer S Ahluwalia, Jonas Bacelis, Clemens Baumbach, Gyda Bjornsdottir, Johannes H Brandsma, Maria Pina Concas, Jaime Derringer, Tessel E Galesloot, Giorgia Girotto, Richa Gupta, Leanne M Hall, Sarah E Harris, Edith Hofer, Momoko Horikoshi, Jennifer E Huffman, Kadri Kaasik, Ioanna P Kalafati, Robert Karlsson, Jari Lahti, Sven J van der Lee, Christiaan de Leeuw, Penelope A Lind, Karl-Oskar Lindgren, Tian Liu, Massimo Mangino, Jonathan Marten, Evelin Mihailov, Michael B Miller, Peter J van der Most, Christopher Oldmeadow, Antony Payton, Natalia Pervjakova, Wouter J Peyrot, Yong Qian, Olli Raitakari, Rico Rueedi, Erika Salvi, Börge Schmidt, Katharina E Schraut, Jianxin Shi, Albert V Smith, Raymond A Poot, Beate St Pourcain, Alexander Teumer, Gudmar Thorleifsson, Niek Verweij, Dragana Vuckovic, Juergen Wellmann, Harm-Jan Westra, Jingyun Yang, Wei Zhao, Zhihong Zhu, Behrooz Z Alizadeh, Najaf Amin, Andrew Bakshi, Sebastian E Baumeister, Ginevra Biino, Klaus Bønnelykke, Patricia A Boyle, Harry Campbell, Francesco P Cappuccio, Gail Davies, Jan-Emmanuel De Neve, Panos Deloukas, Ilja Demuth, Jun Ding, Peter Eibich, Lewin Eisele, Niina Eklund, David M Evans, Jessica D Faul, Mary F Feitosa, Andreas J Forstner, Ilaria Gandin, Bjarni Gunnarsson, Bjarni V Halldórsson, Tamara B Harris, Andrew C Heath, Lynne J Hocking, Elizabeth G Holliday, Georg Homuth, Michael A Horan, Jouke-Jan Hottenga, Philip L de Jager, Peter K Joshi, Astanand Jugessur, Marika A Kaakinen, Mika Kähönen, Stavroula Kanoni, Liisa Keltigangas-Järvinen, Lambertus A L M Kiemeney, Ivana Kolcic, Seppo Koskinen, Aldi T Kraja, Martin Kroh, Zoltan Kutalik, Antti Latvala, Lenore J Launer, Maël P Lebreton, Douglas F Levinson, Paul Lichtenstein, Peter Lichtner, David C M Liewald, Anu Loukola, Pamela A Madden, Reedik Mägi, Tomi Mäki-Opas, Riccardo E Marioni, Pedro Marques-Vidal, Gerardus A Meddens, George McMahon, Christa Meisinger, Thomas Meitinger, Yusplitri Milaneschi, Lili Milani, Grant W Montgomery, Ronny Myhre, Christopher P Nelson, Dale R Nyholt, William E R Ollier, Aarno Palotie, Lavinia Paternoster, Nancy L Pedersen, Katja E Petrovic, Katri Räikkönen, Susan M Ring, Antonietta Robino, Olga Rostapshova, Igor Rudan, Aldo Rustichini, Veikko Salomaa, Alan R Sanders, Antti-Pekka Sarin, Helena Schmidt, Rodney J Scott, Blair H Smith, Jennifer A Smith, Jan A Staessen, Elisabeth Steinhagen-Thiessen, Konstantin Strauch, Antonio Terracciano, Martin D Tobin, Sheila Ulivi, Simona Vaccargiu, Lydia Quaye, Frank J A van Rooij, Cristina Venturini, Anna A E Vinkhuyzen, Uwe Völker, Henry Völzke, Judith M Vonk, Diego Vozzi, Johannes Waage, Erin B Ware, Gonneke Willemsen, John R Attia, David A Bennett, Klaus Berger, Lars Bertram, Hans Bisgaard, Dorret I Boomsma, Ingrid B Borecki, Ute Bültmann, Christopher F Chabris, Francesco Cucca, Daniele Cusi, Ian J Deary, George V Dedoussis, Cornelia M van Duijn, Johan G Eriksson, Barbara Franke, Lude Franke, Paolo Gasparini, Pablo V Gejman, Christian Gieger, Hans-Jörgen Grabe, Jacob Gratten, Patrick J F Groenen, Vilmundur Gudnason, Pim van der Harst, Wolfgang Hoffmann, Elina Hyppönen, William G Iacono, Bo Jacobsson, Marjo-Riitta Järvelin, Karl-Heinz Jöckel, Jaakko Kaprio, Sharon L R Kardia, Terho Lehtimäki, Steven F Lehrer, Nicholas G Martin, Matt McGue, Andres Metspalu, Neil Pendleton, Brenda W J H Penninx, Markus Perola, Nicola Pirastu, Mario Pirastu, Ozren Polasek, Danielle Posthuma, Christine Power, Michael A Province, Nilesh J Samani, David Schlessinger, Reinhold Schmidt, Thorkild I A Sørensen, Tim D Spector, Kari Stefansson, Unnur Thorsteinsdottir, A Roy Thurik, Nicholas J Timpson, Henning Tiemeier, André G Uitterlinden, Veronique Vitart, Peter Vollenweider, David R Weir, James F Wilson, Alan F Wright, Dalton C Conley, Robert F Krueger, George Davey Smith, Albert Hofman, David I Laibson, Sarah E Medland, Jian Yang, Tõnu Esko

Affiliations

¹ Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands. a.okbay@vu.nl.
² Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia.
³ National Bureau of Economic Research, Cambridge, MA, USA.
⁴ UCLA Anderson School of Management, Los Angeles, CA, USA.
⁵ Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands.
⁶ 23andMe, Inc., Sunnyvale, CA, USA.
⁷ Department of Government, Uppsala University, Uppsala, Sweden.
⁸ Swedish Twin Registry, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
⁹ MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK.
¹⁰ Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK.
¹¹ Usher Institute, University of Edinburgh, Edinburgh, UK.
¹² Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK.
¹³ Department of Sociology, Stanford University, Stanford, CA, USA.
¹⁴ McCourt School of Public Policy, Georgetown University, Washington, DC, USA.
¹⁵ Department of Sociology, Princeton University, Princeton, NJ, USA.
¹⁶ Robert M. La Follette School of Public Affairs, University of Wisconsin-Madison, Madison, WI, USA.
¹⁷ Department of Economics, Stockholm School of Economics, Stockholm, Sweden.
¹⁸ Department of Economics, Harvard University, Cambridge, MA, USA.
¹⁹ Center for Translational Bioethics and Health Care Policy, Geisinger Health System, Danville, PA, USA.
²⁰ Department of Psychology, University of Minnesota Twin Cities, Minneapolis, MN, USA.
²¹ Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
²² Department of Economics, New York University, New York, NY, USA.
²³ Center for Experimental Social Science, New York University, New York, NY, USA.
²⁴ Department of Economics, University of Southern California, Los Angeles, CA, USA.
²⁵ Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA.
²⁶ Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia. peter.visscher@uq.edu.au.
²⁷ Interdisciplinary Center for Economic Science and Department of Economics, George Mason University, Fairfax, VA, USA.
²⁸ National Bureau of Economic Research, Cambridge, MA, USA. daniel.benjamin@gmail.com.
²⁹ UCLA Anderson School of Management, Los Angeles, CA, USA. daniel.benjamin@gmail.com.
³⁰ Human Genetics Department, UCLA David Geffen School of Medicine, Los Angeles, CA, USA. daniel.benjamin@gmail.com.
³¹ UCLA Anderson School of Management, Los Angeles, CA, USA. alextisyoung@gmail.com.
³² Human Genetics Department, UCLA David Geffen School of Medicine, Los Angeles, CA, USA. alextisyoung@gmail.com.

^# Contributed equally.

PMID: 35361970
PMCID: PMC9005349
DOI: 10.1038/s41588-022-01016-z

Abstract

We conduct a genome-wide association study (GWAS) of educational attainment (EA) in a sample of ~3 million individuals and identify 3,952 approximately uncorrelated genome-wide-significant single-nucleotide polymorphisms (SNPs). A genome-wide polygenic predictor, or polygenic index (PGI), explains 12-16% of EA variance and contributes to risk prediction for ten diseases. Direct effects (i.e., controlling for parental PGIs) explain roughly half the PGI's magnitude of association with EA and other phenotypes. The correlation between mate-pair PGIs is far too large to be consistent with phenotypic assortment alone, implying additional assortment on PGI-associated factors. In an additional GWAS of dominance deviations from the additive model, we identify no genome-wide-significant SNPs, and a separate X-chromosome additive GWAS identifies 57.

PubMed Disclaimer

Conflict of interest statement

Y.J., B.H., C.T., D.A.H. and the members of the 23andMe Research Team are current or former employees of 23andMe, Inc. All other authors declare no competing interests.

Figures

**Fig. 1. Manhattan plots for the additive and dominance GWASs.**
The top graph (green) shows the additive GWAS (N = 3,037,499 individuals), and the bottom graph (red) shows the dominance GWAS (N = 2,574,253 individuals). The P value and mean χ² values are based on inflation-adjusted two-sided Z tests. The x axis is chromosomal position, and the y axis is the significance on a −log₁₀ scale. The dashed line marks the threshold for genome-wide significance (P = 5 × 10⁻⁸).

**Fig. 2. Polygenic prediction.**
a, Predictive power of the EA PGI as a function of the size of the GWAS discovery sample, with expected predictive power shown by the dashed lines (Supplementary Note section 5.5). b, Prevalence of college completion by EA PGI decile, with 95% CIs. c, Scatterplot of EA PGI (residualized on ten principal components) and EduYears (residualized on sex, a full set of birth-year dummies, their interactions and ten principal components). Prediction samples for all panels are European-ancestry participants in Add Health (N = 5,653) and the HRS (N = 10,843). All PGIs were constructed from EduYears GWAS results that exclude Add Health and HRS using the software LDpred and assuming a normal prior for SNP effect sizes. Incremental R² is the difference between the R² from a regression of EduYears on the PGI and the controls (sex, a full set of birth-year dummies, their interactions and ten principal components) and the R² from a regression of EduYears on just the controls. The individual-level data plotted in c have been jittered by adding a small amount of noise to each observation.

**Fig. 3. Predictive power of the EA PGI and the disease-specific PGI and their combination for ten diseases in the UKB.**
For each disease phenotype, the figure shows the incremental Nagelkerke’s R² from adding the EA PGI, the disease PGI or both PGIs and their interaction to a logistic regression of the disease phenotype on covariates. The covariates are sex, a third-degree polynomial in birth year and their interactions with sex, the first 40 PCs and batch dummies. The error bars represent 95% CIs calculated with the bootstrap percentile method, with 1,000 repetitions.

**Fig. 4. Meta-analysis estimates of direct and population effects of PGIs.**
a, For each PGI, the ratio of the direct effect to the population effect on the phenotype from which the PGI was derived. b, The effects of the EA PGI on 23 phenotypes. Bars are shaded lighter when the population and direct effects are statistically indistinguishable (two-sided Z test P > 0.05/23, where 23 is the number of phenotypes under study). For both panels, estimates are from meta-analyses of UKB, GS, and STR samples of siblings and trios. Phenotypes and the PGIs are scaled to have variance one, so effects correspond to partial correlation coefficients. Error bars represent 95% CIs. See Supplementary Table 9 for details on phenotypes and Supplementary Tables 10–13 for numerical values underlying this figure. FEV1, forced expiratory volume during the first second; HDL, high-density lipoprotein.

**Fig. 5. Correlations between mate-pair PGIs.**
a, Black dots show the correlation between mate-pair EA PGIs (raw) and the correlation between the residuals of the mate-pair EA PGIs after regressions with the listed regressors. Gray dots show the predicted correlations under phenotypic assortment; that is, all correlations between mate-pair EA PGIs are explained by assortment on EA itself. N = 2,344 (861 from UKB and 1,483 from GS). b, Analogous but for the height PGI and predictions under phenotypic assortment on height. N = 2,451 (858 from UKB and 1,593 from GS). For both panels, error bars represent 95% CIs. See Supplementary Table 14 for numerical values underlying this figure.

**Extended Data Fig. 1. Quantile-quantile plots for the additive GWAS meta-analysis.**
The panels display Q-Q plots, which show the -log₁₀(P-values) based on a two-sided Z-tests for **(a)** all SNPs and **(b)** SNPs grouped by minor allele frequency (MAF): rare (<1%), low frequency (1–5%) and common (>5%). The plots and $λ_{G C}$ numbers are based on the unadjusted GWAS summary statistics (that is with standard errors that were *not* inflated by the square root of the estimated LD Score intercept). The dotted line represents the expected -log₁₀(P-values) under the null hypothesis. The (barely visible) gray shaded areas in the Q-Q plots represent the 95% confidence intervals under the null hypothesis. The flat horizontal region in the plots is an inversion region in chromosome 17 (17q21.31).

**Extended Data Fig. 2. LD score plot from the additive GWAS meta-analysis.**
Each point represents an LD score quantile containing 1000 SNPs (except for the last quantile, which contains 709). The x and y coordinates of each point are the mean LD score and the mean statistic of SNPs in that quantile. The LD score regression intercept is 1.663, suggesting that biases due to stratification or cryptic relatedness explain roughly 7% of the inflation in test statistics (see Supplementary Note section 2.2.6).

**Extended Data Fig. 3. Replication of EA3 lead SNPs.**
We examined the out-of-sample replicability of the 1,504 lead SNPs identified at genome-wide significance in a version of our previously published GWAS meta-analysis of *EduYears* (EA3), with the UKB GWAS in that analysis replaced by a UKB GWAS that uses the new phenotype coding explained in Supplementary Note section 1.1. Prior to clumping, we dropped SNPs that had a sample size smaller than 80% of the maximum sample size in the updated EA3 data (N_{EA3,max =} 1,130,819), or that had a sample size in the new data smaller than 80% of the maximum sample size of the new data (N_new,max = 2,272,216). The x axis is the winner’s-curse-adjusted estimate of the SNP’s effect size in the updated EA3 study (calculated using shrinkage parameters estimated using summary statistics from EA3). The y axis is the SNP’s effect size estimated from the subsample of our data that did not contribute to the EA3 GWAS. All effect sizes are from a regression where the phenotype has been standardized to have unit variance. The reference allele is chosen to be the allele estimated to increase EA in EA3. The dashed line is the identity, and the solid line is the fitted regression line. P-values are based on two-sided Z-tests.

**Extended Data Fig. 4. Meta-analysis of X chromosome SNPs (N = 2,713,033 individuals).**
The meta-analysis was conducted by combining summary statistics from (pooled-sex) association analyses conducted in UK Biobank (N = 440,817 individuals) and 23andMe (N = 2,272,216 individuals); see Supplementary Note section 3.4 for details. Panel **(a)**: Manhattan plot, in which P values are based on summary statistics adjusted for inflation using the LD score intercept estimated from an autosomal association analysis of UKB and 23andMe. The solid line indicates the threshold for genome-wide significance (P = 5 × 10⁻⁸ based on a two-sided Z-test adjusted for multiple comparisons). Panel **(b)**: Q-Q plot, in which P values are based on unadjusted Z-test statistics. The dotted line represents the expected -log₁₀(P-values) under the null hypothesis. The (barely visible) gray shaded area in represents the 95% confidence intervals under the null hypothesis.

**Extended Data Fig. 5. Predictive power of the EduYears PGI as a function of pruning at different P value thresholds.**
Each bar represents the incremental $R^{2}$ with error bars showing the 95% confidence intervals bootstrapped with 1,000 iterations each. Each clumping and thresholding PGI is based on a set of approximately independent SNPs identified using the clumping algorithm defined in **Supplementary Note** section 2.2.6. For *HRS* (N = 10,843 individuals) and *Add Health* (N = 5,653 individuals) respectively, the number of SNPs included in the PGI is (with P value threshold in parentheses): 3,806 and 3,843 (5 × 10⁻⁸); 10,852 and 10,897 (5 × 10⁻⁵); 33,159 and 32,693 (5 × 10⁻³); 281,087 and 247,329 (1); 1,137,480 and 1,170,675 (All HapMap3 SNPs, LDpred); 2,540,570 and 2,548,339 (SBayesR). P-values are based on two-sided Z-tests. Incremental $R^{2}$ is the difference between the $R^{2}$ from a regression of *EduYears* on the PGI and the controls (sex, birth-year dummies, their interactions, and 10 PCs) and the $R^{2}$ from a regression of *EduYears* on just the controls.

**Extended Data Fig. 6. PGI prediction in Add Health, HRS and WLS.**
Predictive power of the PGI constructed from the current *EduYears* GWAS results in three independent prediction cohorts: *Add Health* (N = 5,653), *HRS* (N = 10,843), and *WLS* (N = 8,395). For binary phenotypes, the y-axis is incremental Nagelkerke R². Panel **(a)**: Results for education phenotypes available in *Add Health* and *HRS*. Panel **(b)**: Results for cognitive and academic achievement phenotypes available in either *Add Health*, *HRS* or *WLS*. “Δ Total Cognition” and “Δ Verbal Cognition” are wave to wave changes in total and verbal cognition. In both panels, error bars show 95% confidence intervals for the incremental R², bootstrapped with 1000 iterations each. The number of individuals in the prediction sample for each regression can be found in Supplementary Table 4.

**Extended Data Fig. 7. Prevalence of schooling outcomes by EduYears PGI decile.**
Each decile contains approximately 1,085 respondents in *HRS* and 565 in *Add Health*. Total sample sizes for these phenotypes in each prediction cohort are in Supplementary Table 4. Decile 1 contains the lowest PGI values; decile 10, the highest. Error bars show 95% confidence intervals. Panel **(a)**: High school completion. Panel **(b)**: Grade retention.

**Extended Data Fig. 8. European genetic ancestries to African genetic ancestries relative accuracy.**
Panel **(a)** plots the relative accuracy (RA) with error bars representing confidence intervals with + /− 1 standard error. Panel (b) plots the proportion of the loss of accuracy (LOA) explained by LD and MAF calculated as 100% × (1 − RA_pred(LD+MAF))/(1 − RA_obs) with error bars representing confidence intervals with + /− 1 standard error. RA refers to the European genetic ancestries to African genetic ancestries ratio of prediction accuracies (R²) of PGIs trained in a large sample of European-genetic-ancestry UKB participants (N = 425,231). The accuracy in European-genetic-ancestry participants was assessed in a holdout sample of 10,000 unrelated individuals, while the accuracy in African-genetic-ancestry participants was assessed in a holdout sample of 6,514 unrelated individuals. Phenotype labels: EA (Educational Attainment), Height (standing height), BMI (body mass index), LDL (low-density lipoprotein cholesterol), HDL (high-density lipoprotein cholesterol), TG (triglycerides), ASTHMA (diagnosed asthma), T2D (diagnosed type 2 diabetes) and HTN (diagnosed hypertension). See Supplementary Note section 7 in Wang et al. for additional details. Data underlying this Figure are reported in Supplementary Table 5.

**Extended Data Fig. 9. Odds ratio for selected diseases by deciles of the EA PGI in the UKB.**
The EA PGI was discretized into deciles (1 = lowest, 10 = highest), and nine dummy variables were created to contrast each of deciles 2-10 to decile 1 as the reference. Odds ratio and 95% confidence intervals (the error bars) were estimated using logistic regression while controlling for covariates (sex, a third-degree polynomial in birth year and interactions with sex, the top 40 PCs, and batch dummies).

See this image and copyright information in PMC

Comment in

Indirect paths from genetics to education.
Schork AJ, Peterson RE, Dahl A, Cai N, Kendler KS. Schork AJ, et al. Nat Genet. 2022 Apr;54(4):372-373. doi: 10.1038/s41588-021-00999-5. Nat Genet. 2022. PMID: 35361971 No abstract available.

References

1. Marioni RE, et al. Genetic variants linked to education predict longevity. Proc. Natl Acad. Sci. USA. 2016;113:13366–13371. doi: 10.1073/pnas.1605334113. - DOI - PMC - PubMed
1. Lee JJ, et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 2018;50:1112–1121. doi: 10.1038/s41588-018-0147-3. - DOI - PMC - PubMed
1. Bycroft C, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. - DOI - PMC - PubMed
1. Harden KP, et al. Genetic associations with mathematics tracking and persistence in secondary school. NPJ Sci. Learn. 2020;5:1. doi: 10.1038/s41539-020-0060-2. - DOI - PMC - PubMed
1. Kong A, et al. The nature of nurture: effects of parental genotypes. Science. 2018;359:424–428. doi: 10.1126/science.aan6877. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals

Collaborators

Affiliations

Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources