Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 19:12:RP92574.
doi: 10.7554/eLife.92574.

Haplotype function score improves biological interpretation and cross-ancestry polygenic prediction of human complex traits

Affiliations

Haplotype function score improves biological interpretation and cross-ancestry polygenic prediction of human complex traits

Weichen Song et al. Elife. .

Abstract

We propose a new framework for human genetic association studies: at each locus, a deep learning model (in this study, Sei) is used to calculate the functional genomic activity score for two haplotypes per individual. This score, defined as the Haplotype Function Score (HFS), replaces the original genotype in association studies. Applying the HFS framework to 14 complex traits in the UK Biobank, we identified 3619 independent HFS-trait associations with a significance of p < 5 × 10-8. Fine-mapping revealed 2699 causal associations, corresponding to a median increase of 63 causal findings per trait compared with single-nucleotide polymorphism (SNP)-based analysis. HFS-based enrichment analysis uncovered 727 pathway-trait associations and 153 tissue-trait associations with strong biological interpretability, including 'circadian pathway-chronotype' and 'arachidonic acid-intelligence'. Lastly, we applied least absolute shrinkage and selection operator (LASSO) regression to integrate HFS prediction score with SNP-based polygenic risk scores, which showed an improvement of 16.1-39.8% in cross-ancestry polygenic prediction. We concluded that HFS is a promising strategy for understanding the genetic basis of human complex traits.

Keywords: computational biology; deep learning; fine-mapping; functional genomics; genetics; genome-wide association study; genomics; human; polygenic prediction; systems biology.

Plain language summary

Scattered throughout the human genome are variations in the genetic code that make individuals more or less likely to develop certain traits. To identify these variants, scientists carry out Genome-wide association studies (GWAS) which compare the DNA variants of large groups of people with and without the trait of interest. This method has been able to find the underlying genes for many human diseases, but it has limitations. For instance, some variations are linked together due to where they are positioned within DNA, which can result in GWAS falsely reporting associations between genetic variants and traits. This phenomenon, known as linkage equilibrium, can be avoided by analyzing functional genomics which looks at the multiple ways a gene’s activity can be influenced by a variation. For instance, how the gene is copied and decoded in to proteins and RNA molecules, and the rate at which these products are generated. Researchers can now use an artificial intelligence technique called deep learning to generate functional genomic data from a particular DNA sequence. Here, Song et al. used one of these deep learning models to calculate the functional genomics of haplotypes, groups of genetic variants inherited from one parent. The approach was applied to DNA samples from over 350 thousand individuals included in the UK BioBank. An activity score, defined as the haplotype function score (or HFS for short), was calculated for at least two haplotypes per individual, and then compared to various complex traits like height or bone density. Song et al. found that the HFS framework was better at finding links between genes and specific traits than existing methods. It also provided more information on the biology that may be underpinning these outcomes. Although more work is needed to reduce the computer processing times required to calculate the HFS, Song et al. believe that their new method has the potential to improve the way researchers identify links between genes and human traits.

PubMed Disclaimer

Conflict of interest statement

WS, YS, GL No competing interests declared

Figures

Figure 1.
Figure 1.. Flowchart of the study.
Ind: individual.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. Distribution of number of haplotypes per locus.
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. Linkage disequilibrium (LD) among Haplotype Function Score (HFS).
(A) Distribution of R2 of HFS from adjacent loci. (B) Comparison of R2 of adjacent loci HFS (y-axis) and median LD of SNP from the same adjacent loci (x-axis). (C) Same as B, but y-axis corresponded to HFS without rare variants.
Figure 1—figure supplement 3.
Figure 1—figure supplement 3.. Comparison of inflation factor between Haplotype Function Score (HFS) and SNP association tests.
Lambda: genomic control inflation factors, defined as the median chi-squared statistics of all tested variables divided by 0.476.
Figure 2.
Figure 2.. Fine-mapping result summary.
Gray bar plots indicated the number of loci with posterior inclusion probability (PIP) >0.95 in Haplotype Function Score (HFS) + SUSIE (causal loci). Black bar plots indicated number of SNP with PIP >0.95 in PolyFun or SbayesRC analysis (the larger number was shown). Each grid of heatmap showed the odds ratio of each sequence class loci being causal loci for each trait. ‘All_OR’ indicated odds ratio for pooling all traits together. Enh: enhancer. TF: transcription factor-binding site.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Heritability enrichment within causal loci estimated Linkage Disequilibrium Score regression (LDSC).
Causal loci: loci with posterior inclusion probability (PIP) >0.95. Control loci: nearest locus of a causal locus that reached the same p-value level. R2_enrich: proportion of heritability divided by proportion of SNP. Meta: inverse variance-weighted heritability enrichment of test traits. Error bar indicated 95% confidence interval.
Figure 3.
Figure 3.. Biological enrichment analysis based on Haplotype Function Score (HFS) fine-mapping.
x-axis indicated t statistics of the analyzed term in a multivariate linear regression (Method). Cell: single-cell ATAC peak for 222 cell types from Zhang et al., 2021a. Tissue: active chromatin regions of 222 tissues from epimap (Boix et al., 2021). For each trait, we showed the most significant term plus one or two terms with high biological interpretation that also passed significance threshold. Full enrichment result is shown in Supplementary file 1g and Supplementary file 1h.
Figure 4.
Figure 4.. Haplotype Function Score (HFS) linked trait to causal genes.
(A) Target genes of causal loci identified by HFS + SUSIE for platelet count. Only genes that showed functional convergence were shown. (B) Regional plot for RBBP5. HFS: loci posterior inclusion probability (PIP) calculated by HFS + SUSIE. SNP: SNP PIP calculated by PolyFun. cCRE: credible cis-regulation elements. (C) Regional plot of major histocompatibility complex (MHC) region for asthma. Thickened curve linked highlighted causal loci to its target genes predicted by cS2G (Gazal et al., 2022).
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. p-value comparison within RBBP5 region between Haplotype Function Score (HFS) and SNP association test.
x-axis: Same chromosome region as Figure 4. Each black point represented a SNP, its y-axis represented REGENIE genome-wide association studies (GWAS) p-value. Each pink point represented a locus, its y-axis represented its p-value for HFS association test with platelet count.
Figure 4—figure supplement 2.
Figure 4—figure supplement 2.. Locus analysis for allergic diseases.
Similar to Figure 4C, but for other allergic diseases.
Figure 5.
Figure 5.. Haplotype Function Score (HFS)-based polygenic prediction.
(A) Prediction R2 of HFS-based polygenic risk score (PRS) using different threshold of posterior inclusion probability (PIP). allSNP: SNP-based PRS calculated by LDAK-BOLT (Zhang et al., 2021b). n: number of features included in the corresponding PRS. (B) Prediction R2 of per-block HFS score in British European test set by different methods. EN: elastic net. (C) Prediction R2 of different tools in non-British European (NBE), South Asian (SAS), East Asian (EAS), and African (AFR) groups in UK Biobank.
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. Proportion of heritability captured by Haplotype Function Score (HFS) polygenic risk score (PRS).
Author response image 1.
Author response image 1.
Author response image 2.
Author response image 2.
Author response image 3.
Author response image 3.
Author response image 4.
Author response image 4.
Author response image 5.
Author response image 5.
Author response image 6.
Author response image 6.
Author response image 7.
Author response image 7.
Author response image 8.
Author response image 8.
Author response image 9.
Author response image 9.

Update of

  • doi: 10.1101/2023.08.08.552392
  • doi: 10.7554/eLife.92574.1
  • doi: 10.7554/eLife.92574.2

Similar articles

References

    1. Aguet F, Barbeira AN, Bonazzola R, Brown A, Castel SE, Jo B, Kasela S, Kim-Hellmuth S, Liang Y, Oliva M, Flynn ED, Parsana P, Fresard L, Gamazon ER, Hamel AR, He Y, Hormozdiari F, Mohammadi P, Muñoz-Aguirre M, Park YS, Saha A, Strober BJ, Wen X, Wucher V, Ardlie KG, Battle A, Brown CD, Cox N, Das S, Dermitzakis ET, Engelhardt BE, Garrido-Martín D, Gay NR, Getz GA, Guigó R, Handsaker RE, Hoffman PJ, Im HK, Kashin S, Kwong A, Lappalainen T, Xiao L, MacArthur DG, Montgomery SB, Rouhana JM, Stephens M, Stranger BE, Todres E, Viñuela A, Wang G, Zou Y, Anand S, Gabriel S, Graubert A, Hadley K, Huang KH, Nguyen JL, Balliu DT, Conrad B, Cotter DF, Einson J, Eskin E, Eulalio TY, Ferraro NM, Gloudemans MJ, Hou L, Kellis M, Xin L, Mangul S, Nachun DC, Nobel AB, Park Y, Rao AS, Reverter F, Sabatti C, Skol AD, Teran NA, Wright F, Ferreira PG, Li G, Melé M, Yeger-Lotem E, Barcus ME, Bradbury D, Krubit T, McLean JA, Qi L, Robinson K, Smith AM, Sobin L, Tabor DE, Undale A, Bridge J, Brigham LE, Foster BA, Gillard BM, Hasz R, Hunter M, Johns C, Johnson M, Karasik E, Kopen G, Leinweber WF, McDonald A, Moser MT, Myer K, Ramsey KD, Roe B, Shad S, Thomas JA, Walters G, Washington M, Wheeler J, Jewell SD, Rohrer DC, Valley DR, Davis DA, Mash DC, Branton PA, Sobin L, Barker LK, Gardiner HM, Mosavel M, Siminoff LA, Flicek P, Haeussler M, Juettemann T, Kent WJ, Lee CM, Powell CC, Rosenbloom KR, Ruffier M, Sheppard D, Taylor K, Trevanion SJ, Zerbino DR, Abell NS, Akey J, Chen L, Demanelis K, Doherty JA, Feinberg AP, Hansen KD, Hickey PF, Hou L, Jasmine F, Jiang L, Kaul R, Kellis M, Kibriya MG, Li JB, Li Q, Lin S, Linder SE, Pierce BL, Rizzardi LF, Smith KS, Snyder M, Stamatoyannopoulos J, Tang H, Wang M, Branton PA, Carithers LJ, Guan P, Koester SE, Little AR, Moore HM, Nierras CR, Rao AK, Vaught JB, Volpi S. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776. - DOI - PMC - PubMed
    1. Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Peltonen L, Dermitzakis E, Bonnen PE, Altshuler DM, Gibbs RA, de Bakker PIW, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Yu F, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Gibbs RA, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, Lee C, McCarrol SA, Nemesh J, Dermitzakis E, Keinan A, Montgomery SB, Pollack S, Price AL, Soranzo N, Bonnen PE, Gibbs RA, Gonzaga-Jauregui C, Keinan A, Price AL, Yu F, Anttila V, Brodeur W, Daly MJ, Leslie S, McVean G, Moutsianas L, Nguyen H, Schaffner SF, Zhang Q, Ghori MJR, McGinnis R, McLaren W, Pollack S, Price AL, Schaffner SF, Takeuchi F, Grossman SR, Shlyakhter I, Hostetter EB, Sabeti PC, Adebamowo CA, Foster MW, Gordon DR, Licinio J, Manca MC, Marshall PA, Matsuda I, Ngare D, Wang VO, Reddy D, Rotimi CN, Royal CD, Sharp RR, Zeng C, Brooks LD, McEwen JE. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–58. doi: 10.1038/nature09298. - DOI - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25:25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Avsec Ž, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, Taylor KR, Assael Y, Jumper J, Kohli P, Kelley DR. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods. 2021;18:1196–1203. doi: 10.1038/s41592-021-01252-x. - DOI - PMC - PubMed
    1. Baca SC, Singler C, Zacharia S, Seo JH, Morova T, Hach F, Ding Y, Schwarz T, Huang CCF, Anderson J, Fay AP, Kalita C, Groha S, Pomerantz MM, Wang V, Linder S, Sweeney CJ, Zwart W, Lack NA, Pasaniuc B, Takeda DY, Gusev A, Freedman ML. Genetic determinants of chromatin reveal prostate cancer risk mediated by context-dependent gene regulation. Nature Genetics. 2022;54:1364–1375. doi: 10.1038/s41588-022-01168-y. - DOI - PMC - PubMed