Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 2;16(1):6098.
doi: 10.1038/s41467-025-61423-8.

Uncovering causal gene-tissue pairs and variants through a multivariate TWAS controlling for infinitesimal effects

Affiliations

Uncovering causal gene-tissue pairs and variants through a multivariate TWAS controlling for infinitesimal effects

Yihe Yang et al. Nat Commun. .

Abstract

Transcriptome-wide association studies (TWAS) are commonly used to prioritize causal genes underlying associations found in genome-wide association studies (GWAS) and have been extended to identify causal genes through multivariate TWAS methods. However, recent studies have shown that widespread infinitesimal effects due to polygenicity can impair the performance of these methods. In this report, we introduce a multivariate TWAS method named tissue-gene pairs, direct causal variants, and infinitesimal effects selector (TGVIS) to identify tissue-specific causal genes and direct causal variants while accounting for infinitesimal effects. In simulations, TGVIS maintains an accurate prioritization of causal gene-tissue pairs and variants and demonstrates comparable or superior power to existing approaches, regardless of the presence of infinitesimal effects. In the real data analysis of GWAS summary data of 45 cardiometabolic traits and expression/splicing quantitative trait loci from 31 tissues, TGVIS is able to improve causal gene prioritization and identifies novel genes that were missed by conventional TWAS.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests. Ethics approval: The study was approved by the institutional review board (IRB number: STUDY20180592) at Case Western Reserve University.

Figures

Fig. 1
Fig. 1. Overview of TGVIS.
A A hypothetical causal diagram illustrating the relationships between variants (including xQTLs, direct causal variants, and non-causal variants), tissue-specific gene expressions, and an outcome in a cis-region, where the arrows indicate the flow of causal effects in the causal diagram. Variants may be in LD, with only a subset having cis-regulatory effects. Gene expressions or splicing events are tissue-specific and form a complex co-regulation network. Only molecular phenotypes directly connected to the outcome are considered causal. B Locus-zoom plot of the LDL-C GWAS in the PCSK9 locus. The bottom panel displays the coding regions of genes located within this locus, including PCSK9, UPS24, BSND, etc. P values were calculated by χ2-test with 1 degree of freedom. C Workflow of TGVIS, consisting of three main steps. (I) Input, including GWAS summary data, eQTL summary data from multiple tissues, and LD matrix. (II) Preprocessing, including eQTL selection and pre-screening. We applied S-Predixcan to pre-screen some noise pairs, aiming to reduce the dimension of the multivariable TWAS model to a reasonable scale. (III) Estimation, where TGVIS first selects the causal gene-tissue pairs and direct causal variants via SuSiE and then estimates the infinitesimal effect via REML. (IV) Output, including the causal effect estimate, direct causal effect estimate, and infinitesimal effect estimates. We output plots demonstrating the causal gene-tissue pairs, direct causal variants and predicted infinitesimal effects: (1) the Pratt indices and other statistics such as PIPs, estimates, SEs of causal gene-tissue pairs in the 95% credible sets, (2) the Pratt indices of the direct causal variants in the 95% credible sets, and (3) the best linear unbiased predictors of infinitesimal effects. The non-zero variance in output III in this figure suggests the non-zero contribution of infinitesimal effects. The figure was created in BioRender. Yang, Y. (2025) https://BioRender.com/tpngnr4.
Fig. 2
Fig. 2. Simulation results comparing the performances of TGVIS, TGFM, cTWAS, Grant2022, and cisIVW with xQTL sample size = 200 and replications = 500.
A The MSE of causal effect estimates under no pleiotropy, in the presence of direct causal variants, infinitesimal effects, and both. B The true negative rate of identifying all 98 non-causal gene-tissue pairs under different scenarios i.e., no pleiotropy, in the presence of direct causal variants, infinitesimal effects, and both. This is equivalent to that if a method incorrectly identifies any non-causal pairs as causal, it will not be counted as a true negative event. C Bar plots display the true positive rates of identifying all 2 causal gene-tissue pairs under different scenarios. D The averaged number of identified direct causal variants by the different methods. The number of true causal variants were set to 0, 2, 0, and 2 for no-pleiotropy, direct-causal-variant, infinitesimal-effects, and direct-causal-variant and infinitesimal-effects, respectively. E The averaged correlation of the true and estimated direct causal effects across simulations. F The averaged correlation of the true and predicted infinitesimal effects across simulations.
Fig. 3
Fig. 3. Summary of the identification of causal gene-tissue pairs and direct causal variants.
A, B The number and proportion of causal and likely novel causal gene-tissue pairs identified by TGVIS and TGFM, respectively. Likely novel gene-tissue pairs are defined as those do not present in the list of significant gene-tissue pairs identified by univariable S-PrediXcan (P < 0.05/20000). The proportion refers to the average number of causal and likely novel causal gene-tissue pairs per locus. C The number and proportion of direct causal variants identified by TGVIS and TGFM. D The distribution of the number of traits affected by causal gene-tissue pairs. E, F The distributions of scores for FathmmXF and Encode H3K9me3Sum annotations. Raincloud plots illustrate four classes: direct causal variants and xQTLs of causal gene-tissue pairs identified by TGVIS and TGFM. Pairwise Wilcoxon signed-rank test P values (two-side) are displayed at the top, while medians of annotation scores are shown at the bottom. The median was shown as a black bar. The lower and upper hinges corresponded to the 25th and 75th percentiles. The “sample sizes” in the test are the numbers of variants, which are 1256, 4787, 9552, 19057 for TGVIS (direct causal variant), TGVIS (xQTL of gene-tissue pairs), TGFM (direct causal variant), TGFM (xQTL of gene-tissue pairs), respectively. Source data are provided as a Source Data file. The figure was created in BioRender. Yang, Y. (2025) https://BioRender.com/b65f9a0.
Fig. 4
Fig. 4. Genetic architecture inferred from the identification of causal gene-tissue pairs and direct causal variants.
A The ratio of identified causal gene-tissue pairs per credible set by TVGIS. Different gene-tissue pairs may share the same set of xQTLs, and end in the same credible set. B The ratio of the number of causal eQTLs over the number of sQTLs per causal gene-tissue pair, indicating the distribution of eQTLs and sQTLs contributing to the gene-tissue pairs. C The distribution of eGene and sGene in credible sets identified by TGVIS and TGFM. When a credible set contains multiple gene-tissue pairs, we calculate the proportion of eGenes and sGenes. D The distribution of Pratt Index estimates for different traits, with a comparison between TGVIS and TGFM. In the boxplot, each point represents the Pratt Index of various molecular phenotypes within a single locus. The median was shown as a black bar. The lower and upper hinges corresponded to the 25th and 75th percentiles. Source data are provided as a Source Data file. The figure was created in BioRender. Yang, Y. (2025) https://BioRender.com/ch89ux4.
Fig. 5
Fig. 5. Distribution of major tissues for cardiometabolic traits.
A Heatmaps display the major tissues associated with each trait, identified by TGVIS. B Heatmaps display the major tissues associated with each trait, identified by TGFM. The major gene-tissue pairs are cataloged based on stringent criteria (CS-Pratt > 0.15 for TGVIS and PIP > 0.5 for TGFM), and the proportions of major tissues derived from significant gene-tissue pairs for each trait are quantified. Hierarchical clustering is applied to arrange the heatmaps, utilizing the Ward2 method and Euclidean distance. C Major tissues of lipid traits identified by TGVIS and TGFM. This panel shows bar plots detailing the number of causal gene-tissue pairs for various lipid traits, including HDL-C, LDL-C, TC, triglycerides, APOA1, and APOB, as identified by both TGVIS (top) and TGFM (bottom). Source data are provided as a Source Data file. The figure was created in BioRender. Yang, Y. (2025) https://BioRender.com/1s8s2iy.
Fig. 6
Fig. 6. Evaluation of identified gene-tissue pairs.
A The colocalized proportions of causal credible sets (under two criteria) yielded by TGVIS and TGFM, respectively. B The numbers and proportions of causal cis-genes in the list of FDA-approved drug-target genes provided by Trajanoska et al., identified by TGVIS (left) and TGFM (right), respectively. C The number of significant pGenes in univariable MR analysis and the ratio of significant pGene in univariable MR analysis divided by significant eGenes/sGenes in eQTL/sQTL analysis. Source data are provided as a Source Data file. The figure was created in BioRender. Yang, Y. (2025) https://BioRender.com/ouhjfzd.
Fig. 7
Fig. 7. Locus-zoom plots comparing the results of TGVIS and TGFM.
A PCSK9 locus-zoom plot for LDL-C GWAS. B PCSK9 locus results for TGVIS. C PCSK9 locus results for TGFM. D HMGCR locus-zoom plot for LDL-C GWAS. E HMGCR locus results for TGVIS. F HMGCR locus results for TGFM. G PHACTR1 locus-zoom plot for CAD GWAS. H PHACTR1 locus results for TGVIS. I PHACTR1 locus results for TGFM. J FTO locus-zoom plot for BMI GWAS. K FTO locus results for TGVIS. L FTO locus results for TGFM. In each panel of fine-mapping results, the upper portion displays individual PIPs of identified gene-tissue pairs and direct variants, while the lower portion shows Pratt indices of identified credible sets. For TGVIS, causality is determined by (1) the variables are in a 95% credible set and (2) the Pratt index of this credible set is larger than 0.15. For TGFM, the causality is determined by (1) the individual PIP is larger than 0.5. The red diamond in a locus-zoom plot indicates the most significant SNP at the locus. PIPs were calculated by SuSiE. Source data are provided as a Source Data file. The figure was created in BioRender. Yang, Y. (2025) https://BioRender.com/jrzcdig.

Update of

References

    1. Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature600, 675–679 (2021). - DOI - PMC - PubMed
    1. Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature610, 704–712 (2022). - DOI - PMC - PubMed
    1. Suzuki, K. et al. Genetic drivers of heterogeneity in type 2 diabetes pathophysiology. Nature627, 347–357 (2024). - DOI - PMC - PubMed
    1. Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet.55, 1866–1875 (2023). - DOI - PMC - PubMed
    1. Wallace, C. A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS Genet.17, e1009440 (2021). - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources