Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Jan 7:2024.05.17.24307550.
doi: 10.1101/2024.05.17.24307550.

Genome-wide association studies in a large Korean cohort identify novel quantitative trait loci for 36 traits and illuminate their genetic architectures

Affiliations

Genome-wide association studies in a large Korean cohort identify novel quantitative trait loci for 36 traits and illuminate their genetic architectures

Yon Ho Jee et al. medRxiv. .

Update in

Abstract

Genome-wide association studies (GWAS) have been predominantly conducted in populations of European ancestry, limiting opportunities for biological discovery in diverse populations. We report GWAS findings from 153,950 individuals across 36 quantitative traits in the Korean Cancer Prevention Study-II (KCPS2) Biobank. We discovered 301 novel genetic loci in KCPS2, including an association between thyroid-stimulating hormone and CD36. Meta-analysis with the Korean Genome and Epidemiology Study, Biobank Japan, Taiwan Biobank, and UK Biobank identified 4,588 loci that were not significant in any contributing GWAS. We describe differences in genetic architectures across these East Asian and European samples. We also highlight East Asian specific associations, including a known pleiotropic missense variant in ALDH2, which fine-mapping identified as a likely causal variant for a diverse set of traits. Our findings provide insights into the genetic architecture of complex traits in East Asian populations and highlight how broadening the population diversity of GWAS samples can aid discovery.

Keywords: Korean population; complex traits; genetic architecture; genome-wide association study.

PubMed Disclaimer

Figures

Figure 1 |
Figure 1 |. Overview of the Korean Cancer Prevention Study-II Biobank and analysis.
Detailed descriptions of the 36 quantitative traits examined in this study are shown in Table S1. After QC, the data were phased using SHAPEIT4 and imputed using IMPUTE5 with 1000 Genomes Project Phase 3 data.
Figure 2 |
Figure 2 |. GWAS results for 36 quantitative traits in the Korean Cancer Prevention Biobank-II (KCPS2).
(a) Number of known and novel variants identified in KCPS2 compared to the Open Target Genetics using EFO terms (Table S2–S3). (b) A summary of genome-wide significant loci associated with the 36 traits in KCPS2. Each locus was mapped to a gene using FUMA with a 1000 Genome Phase 3 East Asian reference panel. We then counted the number of associated traits (out of 36 traits) per gene (Table S4). (c) Comparisons of pairwise genetic correlations (rg) between phenotypic correlations (rp) for the 36 traits in KCPS2. rg was estimated using bivariate LDSC based on association test statistics from linear regression. Significant rg and rp after false discovery rate (FDR<0.05) correction is indicated by purple if both rg and rp were significant, red if only rg was significant, blue if only rp was significant, and gray if neither was significant. The black solid line was estimated by spline smoothing from a linear regression model. The complete set of rg and rp is available in Table S5.
Figure 3 |
Figure 3 |. Meta-analysis of 21 traits across KCPS2, KoGES, BBJ, TWB, and UKB.
(a) Genome-wide significant loci identified in the meta-analysis, Color of dots indicate significance in meta-analysis (black), KCPS2 (blue), KoGES (orange), BBJ (purple), TWB (light blue), and UKB (green). Multiple dots in a bar represent simultaneous significance in multiple cohorts. (b) Comparisons of allele frequency and effect sizes in KCPS2 for the genome-wide significant variants discovered only in KCPS2 (blue) versus those identified only in the meta-analysis (black). (c) Comparisons of effect sizes in KCPS2 and study-specific effect sizes for the lead variants at the 12,224 meta-analysis genome-wide significant loci. The solid lines were estimated by spline smoothing from generalized additive model (b) or linear regression model (c). Full meta-analysis results are shown in Table S6–7.
Figure 4 |
Figure 4 |. Genetic architecture of complex traits across KCPS2, BBJ, TWB, and UKB.
(a) The dots represent posterior means and horizontal bars represent standard errors of the parameters for each trait. The vertical dashed line shows the median of the estimates across traits. Full results are shown in Table S8. (b-e) Pearson correlations of SNP-heritability between KCPS2 and KoGES (b), BBJ (c), TWB (d), and UKB (e) across the traits shown in a, except for TWB. For heritability in TWB shown in (d), we used the heritability estimates reported by Chen and colleagues. The comparisons between KCPS2 heritability estimates and TWB heritability estimates using SbayesS and KCSP2 LD matrix are shown in Table S6b. Data are presented as posterior means of SNP-heritability. The trait categories are indicated by different colors labeled with their trait names. For the 8 traits available in all five studies, we observed high correlations of heritability between KCPS2 and the other biobanks: KoGES (Pearson correlation r=0.99, 95% confidence interval [CI]: 0.97–1.00), BBJ (r=0.93, 95% CI: 0.64–0.99), TWB (r=0.93, 95% CI: 0.65–0.99), and UKB (r=0.97, 95% CI: 0.82–0.99).
Figure 5 |
Figure 5 |. Fine-mapping and colocalization analysis of ALDH2 region in KCPS2.
(a) Association between ALDH2 (12q24.12) and alcohol intake in KCPS2. Colors in the Manhattan panels represent r2 values to the lead variant rs671. In the posterior inclusion probability (PIP) panels, only fine-mapped variants in the 95% credible sets (CS) are colored. Heatmap represents significant variants (P<5.0×10−8) for the other quantitative traits. (b) Colocalization analysis between alcohol intake and seven traits that showed PIP>0.9 for rs671 was done in the same region. Coloc.PP4 represents the posterior probability of colocalization at the specified region. All seven traits shown here had coloc.PP4=1 with alcohol intake. Each regional plot shows associations of each locus for the most significantly associated trait, which was all rs671 with PIP>0.9. Full fine-mapping and colocalization analysis results are shown in Table S10.

Similar articles

References

    1. Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D. Benefits and limitations of genome-wide association studies. Nat Rev Genet. 2019. Aug;20(8):467–484. - PubMed
    1. Abdellaoui A, Yengo L, Verweij KJH, Visscher PM. 15 years of GWAS discovery: Realizing the promise. Am J Hum Genet. 2023. Feb 2;110(2):179–194. - PMC - PubMed
    1. Chatterjee N, Shi J, García-Closas M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat Rev Genet. Nature Publishing Group; 2016. Jul;17(7):392–406. - PMC - PubMed
    1. Lambert SA, Abraham G, Inouye M. Towards clinical utility of polygenic risk scores. Hum Mol Genet. 2019. Nov 21;28(R2):R133–R142. - PubMed
    1. Fatumo S, Chikowore T, Choudhury A, Ayub M, Martin AR, Kuchenbaecker K. A roadmap to increase diversity in genomic studies. Nat Med. Nature Publishing Group; 2022. Feb;28(2):243–250. - PMC - PubMed

Publication types