Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Apr 16:2024.04.15.24305836.
doi: 10.1101/2024.04.15.24305836.

Improved multi-ancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk

Affiliations

Improved multi-ancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk

Zeyun Lu et al. medRxiv. .

Update in

Abstract

Multi-ancestry statistical fine-mapping of cis-molecular quantitative trait loci (cis-molQTL) aims to improve the precision of distinguishing causal cis-molQTLs from tagging variants. However, existing approaches fail to reflect shared genetic architectures. To solve this limitation, we present the Sum of Shared Single Effects (SuShiE) model, which leverages LD heterogeneity to improve fine-mapping precision, infer cross-ancestry effect size correlations, and estimate ancestry-specific expression prediction weights. We apply SuShiE to mRNA expression measured in PBMCs (n=956) and LCLs (n=814) together with plasma protein levels (n=854) from individuals of diverse ancestries in the TOPMed MESA and GENOA studies. We find SuShiE fine-maps cis-molQTLs for 16% more genes compared with baselines while prioritizing fewer variants with greater functional enrichment. SuShiE infers highly consistent cis-molQTL architectures across ancestries on average; however, we also find evidence of heterogeneity at genes with predicted loss-of-function intolerance, suggesting that environmental interactions may partially explain differences in cis-molQTL effect sizes across ancestries. Lastly, we leverage estimated cis-molQTL effect-sizes to perform individual-level TWAS and PWAS on six white blood cell-related traits in AOU Biobank individuals (n=86k), and identify 44 more genes compared with baselines, further highlighting its benefits in identifying genes relevant for complex disease risk. Overall, SuShiE provides new insights into the cis-genetic architecture of molecular traits.

PubMed Disclaimer

Conflict of interest statement

Competing interests L.W. provided consulting service to Pupil Bio Inc. and reviewed manuscripts for Gastroenterology Report, not related to this study, and received honorarium. No potential conflicts of interest were disclosed by the other authors.

Figures

Fig. 1:
Fig. 1:. SuShiE infers ancestry-specific effect sizes, PIPs, and credible sets by leveraging shared genetic architectures and LD heterogeneity.
A) SuShiE takes individual-level phenotypic and genotypic data as input and assumes the shared cis-molQTL effects as a linear combination of single effects. B) For each single shared effect, SuShiE models the cis-molQTL effect size follows a multivariate normal prior distribution with a covariance matrix, and the probability for each SNP to be moQTL follows a uniform prior distribution; through the inference, SuShiE outputs a credible set that includes putative causal cis-molQTLs, learns the effect-size covariance prior, and estimates the ancestry-specific effect sizes.
Fig. 2:
Fig. 2:. SuShiE outperforms other methods, estimates accurate effect-size correlation, and boosts higher power of TWAS in realistic simulations
A-C) SuShiE outputs higher posterior inclusion probabilities (PIPs; A), smaller credible set sizes (B), and higher frequency of cis-molQTLs in the credible sets (calibration; C) compared to SuShiE-Indep (2.60e-4, 1.5e-1, and 1.30e-11), Meta-SuSiE (P=9.67e-43, 9.35e-231, and 1.17e-76), and SuSiE (P=6.98e-63, 6.65e-2, and 1.58e-104). D) SuShiE accurately estimates the true effect-size correlation across ancestries using the primary effect (First credible sets; CSs) while exhibiting an underestimation using the secondary effects (Second CSs) or combined (All CSs) because the variance explained by the secondary effect decreases, thus requiring higher statistical power. The error bar is a 95% confidence interval. E) SuShiE outputs higher ancestry-specific prediction accuracy compared against SuSiE, LASSO, Elastic Net, and gBLUP (all P<9.57e-8) with the fixed sample size. The plots are aggregation across two ancestries. F) SuShiE induces higher TWAS power compared to SuSiE, LASSO, Elastic Net, and gBLUP (all P<4.34e-14) with the fixed sample size. The plots are aggregation across two ancestries. By default, the simulation assumes that there are 2 causal cis-molQTLs, the per-ancestry training sample size is 400, and the testing sample size is 200, cis-SNP heritability is 0.05, the effect size correlation is 0.8 across ancestries, and the proportion of cis-SNP heritability of complex trait explained by gene expression is 1.5e-14. The error bar is a 95% confidence interval.
Fig. 3:
Fig. 3:. SuShiE reveals cis-regulatory mechanisms for mRNA and protein expression
A) SuShiE identified cis-molQTLs for 14,590, 573, and 5,925 genes whose 88%, 86%, and 96% contain 1–3 cis-molQTLs for the TOPMed-MESA mRNA, TOPMed-MESA protein, and GENOA mRNA dataset, respectively. B) Posterior inclusion probabilities (PIPs) of cis-molQTLs inferred by SuShiE are mainly enriched around the TSS region of genes. We grouped SNPs into 500-bp-long bins and computed their PIP average. There are 2,000 bins to cover a one-million-bp-long genomic window around the genes’ TSS. C) Across all three studies, cis-molQTLs identified by SuShiE are enriched in four out of five candidate cis-regulatory elements (cCREs) from ENCODE, with the promoter (PLS) as the most enriched category. Specifically, the mRNA expression from TOPMed-MESA and GENOA showed enrichment in the promoter, proximal enhancer (pELS), CTCF, and distal enhancer (dELS) but depletion in DNase-H3K4me3. Protein expression from TOPMed-MESA showed enrichment in PLS and pELS but non-significant enrichment in CTCF and dELS because of the low number of genes identified with pQTLs (n=573). The error bar is a 95% confidence interval.
Fig. 4:
Fig. 4:. SuShiE identifies eQTL rs2528382 for URGCP with functional support
A) Manhattan plot of cis-eQTL scans of URGCP (denoted in orange) for each ancestry (above) with SuShiE fine-mapping results (below). SuShiE was the only method to output credible sets for URGCP and prioritized a single SNP (rs2528382; denoted in red). B) Functional annotations at URGCP locus show colocalization of active enhancer activity and chromatin accessibility with rs2528382. H3K27ac CHIP-seq peaks measured in PBMCs (intensity denoted in blue) and 0/1 accessibility annotations determined from scATAC-seq measured in PBMCs and snATAC-seq measured in naive T cells, naive B cells, cytotoxic NK (cNK) cells, and monocytes. Blue rectangles denote a putative cCRE called from sc/snATAC-seq data that colocalize with rs2528382 (gray no colocalization).
Fig. 5:
Fig. 5:. SuShiE identifies more T/PWAS genes compared with SuSiE
A) Scatter plot of T/PWAS t-statistics between SuShiE (y-axis) and SuSiE (x-axis) across all phenotypes and contributing cis-molQTL studies. B) Average T/PWAS chi-square statistics within low, middle, and high constraint scores (see Methods). Error bars represent 95% confidence intervals.

Similar articles

References

    1. Claussnitzer M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020). - PMC - PubMed
    1. Cheung V. G. et al. Mapping determinants of human gene expression by regional and genome-wide association. Nature 437, 1365–1369 (2005). - PMC - PubMed
    1. Consortium GTEx. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020). - PMC - PubMed
    1. Võsa U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021). - PMC - PubMed
    1. Sun B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018). - PMC - PubMed

Publication types

Grants and funding