Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 16;3(12):100436.
doi: 10.1016/j.xgen.2023.100436. eCollection 2023 Dec 13.

Analysis across Taiwan Biobank, Biobank Japan, and UK Biobank identifies hundreds of novel loci for 36 quantitative traits

Affiliations

Analysis across Taiwan Biobank, Biobank Japan, and UK Biobank identifies hundreds of novel loci for 36 quantitative traits

Chia-Yen Chen et al. Cell Genom. .

Erratum in

Abstract

Genome-wide association studies (GWASs) have identified tens of thousands of genetic loci associated with human complex traits. However, the majority of GWASs were conducted in individuals of European ancestries. Failure to capture global genetic diversity has limited genomic discovery and has impeded equitable delivery of genomic knowledge to diverse populations. Here we report findings from 102,900 individuals across 36 human quantitative traits in the Taiwan Biobank (TWB), a major biobank effort that broadens the population diversity of genetic studies in East Asia. We identified 968 novel genetic loci, pinpointed novel causal variants through statistical fine-mapping, compared the genetic architecture across TWB, Biobank Japan, and UK Biobank, and evaluated the utility of cross-phenotype, cross-population polygenic risk scores in disease risk prediction. These results demonstrated the potential to advance discovery through diversifying GWAS populations and provided insights into the common genetic basis of human complex traits in East Asia.

Keywords: Taiwan Biobank; cross-ancestry GWAS; multi-polygenic score prediction; quantitative traits.

PubMed Disclaimer

Conflict of interest statement

C.-Y.C. is an employee of Biogen. R.J.L. is an employee of Ionis Pharmaceuticals. M.J.D. is a founder of Maze Therapeutics.

Figures

None
Graphical abstract
Figure 1
Figure 1
Overview of the Taiwan Biobank sample and analysis The abbreviations and index numbers for the 36 quantitative traits examined in this study are used throughout the text, tables, and figures. The sample size noted in the figure reflects the final analytical sample size after genotype quality control and imputation. Created with BioRender.com.
Figure 2
Figure 2
GWAS results for 36 quantitative traits in the Taiwan Biobank (TWB) (A) A summary of genome-wide significant loci associated with the 36 traits in TWB identified by whole-genome linear regression implemented in Regenie (two-sided score test for genetic association, controlling for age, age2, sex, age by sex interaction, age2 by sex interaction, and top 20 PCs). Each row of the plot represents a single trait, with traits within the same category grouped by the same color. Each dot represents a genome-wide significant locus (p value < 5 × 10−8). The most pleiotropic genes identified in this study are annotated (see Figure 4B). Manhattan plots and Q-Q plots for each trait are in Figures S1, S2, and S36. (B) SNP-based heritability (h2g) for the 36 traits in TWB estimated using univariate LD score regression (LDSC) based on association test statistics from linear regression (see STAR Methods). Abbreviations of the traits are listed in Figure 1. The complete set of h2g estimates and standard errors is available in Table S4. The unusually large confidence interval (CI) of the h2g estimate for total bilirubin (T-BIL) is driven by a Mendelian locus on chromosome 2, harboring the UGT1A1 gene. Modeling the signal in this locus as a fixed effect and removing the locus from the LDSC analysis produced a similar point estimate of h2g with a much smaller CI (see STAR Methods). (C) Pairwise genetic correlations (rg) between the 36 traits in TWB estimated using bivariate LDSC based on association test statistics from linear regression (see STAR Methods). Significant rg after false discovery rate correction is indicated by a cross sign (two-sided Wald test). The complete set of rg estimates, including standard errors and p values, is available in Table S5.
Figure 3
Figure 3
Comparison of SNP-based heritability and within- and cross-ancestry genetic correlation estimates for 20 quantitative traits in TWB, BBJ, and UKBB (A) Comparison of the SNP-based heritability estimates (h2g) in BBJ or UKBB against TWB. (B) Comparison of the genetic correlation estimates (rg) between TWB and BBJ (within EAS) against the cross-ancestry rg estimates between TWB and UKBB (EAS vs. EUR). (C) Comparison of the genetic correlation estimates (rg) between TWB and BBJ (within EAS) against the cross-ancestry rg estimates between BBJ and UKBB (EAS vs. EUR). A total of 20 traits for which GWAS summary statistics were available across the three biobanks were included for comparison: BMI (3), DBP (10), SBP (11), WBC (13), RBC (14), HB (15), HCT (16), PLT (17), CR (19), T-BIL (22), ALT (23), AST (24), GGT (25), ALB (27), FG (31), HBA1C (32), TC (33), HDL-C (34), LDL-C (35), and TG (36) (see Figure 1 for full names of the traits). h2g values were estimated using LDSC, and rg values were estimated using S-LDXR based on high-quality variants available across the three biobanks and GWASs generated by linear regression. The dotted line indicates the diagonal line in each plot. The complete results of the h2g and rg analyses are available in Tables S8 and S9.
Figure 4
Figure 4
Genetic loci associated with quantitative traits in the East Asian populations (A) Genome-wide significant loci identified in the TWB and BBJ meta-analysis, tallied based on their significance in TWB, BBJ, and UKBB or the largest consortium GWAS in samples of European ancestries. (B) Distribution of the pleiotropic genes defined as the number of associated traits for each gene. The HLA region (chromosome 6; 28.5 to 33.5 Mb) was treated as a single locus and excluded from the figure. A total of 15 traits showed genome-wide significant associations in the HLA region, including height (HT), weight (WT), body mass index (BMI), white blood cell (WBC), red blood cell (RBC), platelet (PLT), creatinine (CR), uric acid (UA), alanine aminotransferase (ALT), aspartate aminotransferase (AST), albumin (ALB), total cholesterol (TC), high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglyceride (TG). The complete results of loci discovery and information on the pleiotropic genes are available in Tables S10 and S12.
Figure 5
Figure 5
Polygenic prediction of common complex diseases in the Taiwan Biobank PRS-CSx was applied to jointly model the East Asian (EAS) and European (EUR) GWAS summary statistics of each biomarker and derive an EAS-specific and an EUR-specific polygenic risk score (PRS). Each disease was predicted by the linear combination of PRSs from one or more biomarkers (right panel), controlling for age, sex, and top 20 principal components (PCs) of genotype data. The left-out TWB sample (n = 10,285) was repeatedly and randomly divided into a validation dataset (where tuning parameters and the optimal linear combination of PRSs were learned) and a testing dataset (where the predictive performance of the final linearly combined PRSs was assessed). To benchmark the predictive performance of biomarker PRS, self-reported type 2 diabetes (T2D) was also predicted by PRSs derived from the EAS and EUR type 2 diabetes GWASs. Biomarker GWASs in EAS were obtained from the meta-analysis of TWB and BBJ; biomarker GWASs in EUR were obtained from UKBB or the largest consortium GWAS. The T2D disease GWAS in EAS has 49,992 cases and 219,905 controls (4,609 cases and 87,873 controls from TWB; 45,383 cases and 132,032 controls from BBJ); the T2D GWAS in EUR has 74,124 cases and 824,006 controls. Each dot in the left panel represents the prediction accuracy (variance explained on the liability scale) from one random split of the dataset. Error bar represents the standard error of the prediction accuracy across 100 random splits for each disease. PRS performance metrics are available in Table S13.

References

    1. Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. - DOI - PMC - PubMed
    1. Tam V., Patel N., Turcotte M., Bossé Y., Paré G., Meyre D. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 2019;20:467–484. doi: 10.1038/s41576-019-0127-1. - DOI - PubMed
    1. Sudlow C., Gallacher J., Allen N., Beral V., Burton P., Danesh J., Downey P., Elliott P., Green J., Landray M., et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12 doi: 10.1371/journal.pmed.1001779. - DOI - PMC - PubMed
    1. Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. - DOI - PMC - PubMed
    1. Sakaue S., Kanai M., Tanigawa Y., Karjalainen J., Kurki M., Koshiba S., Narita A., Konuma T., Yamamoto K., Akiyama M., et al. A cross-population atlas of genetic associations for 220 human phenotypes. Nat. Genet. 2021;53:1415–1424. doi: 10.1038/s41588-021-00931-x. - DOI - PubMed

Publication types

LinkOut - more resources