. 2022 Jun 7;13(1):3258.

doi: 10.1038/s41467-022-30956-7.

Integrating 3D genomic and epigenomic data to enhance target gene discovery and drug repurposing in transcriptome-wide association studies

Chachrit Khunsriraksakul^{1

2}, Daniel McGuire^{2

3}, Renan Sauteraud^{2

3}, Fang Chen^{2

3}, Lina Yang^{2

3}, Lida Wang^{2

3}, Jordan Hughey^{1

2}, Scott Eckert^{1

2}, J Dylan Weissenkampen^{2

3}, Ganesh Shenoy⁴, Olivia Marx⁵, Laura Carrel⁶, Bibo Jiang⁷, Dajiang J Liu^{8

9

10}

Affiliations

¹ Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
² Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
³ Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
⁴ Department of Neurosurgery, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
⁵ Biomedical Science Program, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
⁶ Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
⁷ Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA. bjiang@phs.psu.edu.
⁸ Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA. dajiang.liu@psu.edu.
⁹ Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA. dajiang.liu@psu.edu.
¹⁰ Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA. dajiang.liu@psu.edu.

PMID: 35672318
PMCID: PMC9171100
DOI: 10.1038/s41467-022-30956-7

Integrating 3D genomic and epigenomic data to enhance target gene discovery and drug repurposing in transcriptome-wide association studies

Chachrit Khunsriraksakul et al. Nat Commun. 2022.

. 2022 Jun 7;13(1):3258.

doi: 10.1038/s41467-022-30956-7.

Authors

Affiliations

¹ Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
² Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
³ Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
⁴ Department of Neurosurgery, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
⁵ Biomedical Science Program, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
⁶ Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA.
⁷ Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA. bjiang@phs.psu.edu.
⁸ Bioinformatics and Genomics Graduate Program, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA. dajiang.liu@psu.edu.
⁹ Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA. dajiang.liu@psu.edu.
¹⁰ Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA, 17033, USA. dajiang.liu@psu.edu.

PMID: 35672318
PMCID: PMC9171100
DOI: 10.1038/s41467-022-30956-7

Abstract

Transcriptome-wide association studies (TWAS) are popular approaches to test for association between imputed gene expression levels and traits of interest. Here, we propose an integrative method PUMICE (Prediction Using Models Informed by Chromatin conformations and Epigenomics) to integrate 3D genomic and epigenomic data with expression quantitative trait loci (eQTL) to more accurately predict gene expressions. PUMICE helps define and prioritize regions that harbor cis-regulatory variants, which outperforms competing methods. We further describe an extension to our method PUMICE +, which jointly combines TWAS results from single- and multi-tissue models. Across 79 traits, PUMICE + identifies 22% more independent novel genes and increases median chi-square statistics values at known loci by 35% compared to the second-best method, as well as achieves the narrowest credible interval size. Lastly, we perform computational drug repurposing and confirm that PUMICE + outperforms other TWAS methods.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1. Simulation studies comparing the performance of PUMICE to other TWAS methods.**
Panels (a, b) illustrates the comparison of PUMICE to other single-tissue TWAS methods for type I error (a) and power (b). Panels (c, d) illustrates the comparison of PUMICE to multi-tissue TWAS method (UTMOST) for type I error (c) and power (d). For UTMOST, we evaluate its performance across different combinations of genetic correlation between causal and correlated tissues ( $ρ$ ) and number of correlated tissues (N_corr). Shadings represent different training sample sizes used to train gene expression prediction models for single-tissue TWAS methods and $ρ$ /N_corr combinations for multi-tissue TWAS method.

**Fig. 2. Comparison of PUMICE gene expression prediction models to other TWAS methods.**
In panel (a), we compare the number of PUMICE significant models to other TWAS methods, including PrediXcan, FUSION, TIGAR, and UTMOST, across 48 GTEx tissues. Across all scenarios, PUMICE achieves higher number of significant models than those of other single-tissue TWAS methods. In comparison to UTMOST, PUMICE achieves comparable number of models in small sample size tissues, but achieves higher number of models in large sample size tissues. In panel (b), we illustrate the percent increase in significant models between PUMICE and other TWAS methods. Comparing to other single tissue TWAS methods, percent increase of models gets larger for smaller training sample size. In comparison to UTMOST, percent gain of models gets larger for larger training sample sizes.

**Fig. 3. Examples of well-imputed genes unique to PUMICE.**
Panel (a) displays prediction performance of *REXO4* gene in CMC cohort. Panel (b) shows prediction performance of *CASKIN2* gene in GEUVADIS cohort. Panel (c) shows prediction performance of *PELO* gene in DGN cohort. We show the selected window ( $w$ ) and penalty factor ( $ϕ$ ) associated with each PUMICE’s prediction model. Error bands represent 95% confidence intervals.

**Fig. 4. Characteristics of GTEx gene expression prediction models.**
In panel (a), we illustrate the proportions of different window sizes $w$ among selected PUMICE models. Each boxplot is derived from the percent window composition of 48 GTEx tissues. In panel (b), we show the proportion of different values of tuning parameter $ϕ$ among selected PUMICE models. $ϕ$ is the tuning parameter that reduces the L₁ and L₂ penalties for essential predictors that overlap with ENCODE annotations. Each boxplot is derived from the percent penalty factor composition of 48 GTEx tissues. Minima and maxima values (excluding outliers) are represented by the lower- and upper-bound of the whiskers. Median value is represented by the bolded line in the middle. First and third quartiles are represented by the lower- and upper-bound of the box. In panel (c), we show the distribution of the number of SNPs with non-zero weights in gene expression prediction models across different TWAS methods. Vertical line represents median number of SNPs with non-zero weights. PUMICE models have the lowest median number of SNPs with non-zero weights (n = 13), while UTMOST models have the highest median number of SNPs with non-zero weights (n = 73). In panels (d), we plot the distribution of the locations of SNPs with non-zero weights (for PrediXcan, EpiXcan, PUMICE, and UTMOST) or top 100 SNPs with highest weights (for FUSION and TIGAR). Variant counts are plotted against their locations relative to 5’ gene transcription start site (TSS) and 3’ gene transcription end site (TES) across different TWAS methods.

**Fig. 5. PUMICE+ identifies the largest number of gene x trait associations and novel associations across 48 GTEx tissues using GWAS summary-level statistics of 79 traits.**
Panel (a) displays the total number of significant gene x trait associations by each method. Panel (b) shows the number of unique significant gene x trait associations. Gene x trait associations identified in multiple tissues are counted only once. Panel (c) shows the number of independent gene counts. Multiple significant genes within 1 Mb window are counted only once. Panel (d) shows the number of independent novel genes that are outside 1 Mb window on either side of GWAS sentinel variant. PUMICE+ identifies the highest number of gene x trait associations, unique gene count, independent gene count, and novel gene count in comparison to all other methods. Panel (e) displays the distribution of chi-square test statistics at MAGMA-prioritized genes. Median value is denoted in the parentheses. PUMICE+ achieved the largest median value of chi-square values (22.45). P-value is based on the comparison between PUMICE+ and other method using one-sided median test. *** denotes significant differences with p < 0.001. One-sided p-values are provided in the source data file. Each boxplot is derived from the chi-square values across 12,546 MAGMA-prioritized genes. Minima and maxima values (excluding outliers) are represented by the lower- and upper-bound of the whiskers. Median value is represented by the bold line in the middle. First and third quartiles are represented by the lower- and upper-bound of the box.

**Fig. 6. Computational drug repurposing predictions for drugs with known indications in 23 traits.**
Panels (a–c) illustrates the heatmap of CMap scores derived from different TWAS methods for (a) immune-related traits, (b) COVID-19 traits, and (c) other traits. Due to the large number of trait-drug pairs, we only display text description of one for every three trait-drug pairs in the plot. Numbers in the parentheses are the indices of the displayed trait-drug pairs in the full list (Supplementary Data 16). Panel (d) displays the distribution of CMap scores across 23 traits. Median value is denoted in the parentheses. PUMICE+ achieves the most negative median value of CMap score (−77.99), which shows that it identifies putative target genes that are most consistent with target genes of approved drugs. P-values are based on the comparison between PUMICE+ and other methods using one-sided median test. The label “ns” denotes not significant; * denotes significance at p < 0.05; ** denotes significance at p < 0.01; *** denotes significance at p < 0.001. One-sided P-values are provided in the source data file. Minima and maxima values (excluding outliers) are represented by the lower- and upper-bound of the whiskers. Median value is represented by the bold line in the middle. First and third quartiles are represented by the lower- and upper-bound of the box. Trait abbreviations: ad = Alzheimer’s disease; ast = asthma; bip = bipolar disorder; cad = coronary artery disease; cd = Crohn’s disease; ecz = atopic dermatitis; epl = epilepsy; ibd = inflammatory bowel disease; ldl = low-density lipoprotein level; mi = myocardial infarction; ra = rheumatoid arthritis; scz = schizophrenia; t2d = type 2 diabetes; uc = ulcerative colitis; vit = vitiligo.

**Fig. 7. TWAS Manhattan plot for COVID-19-related outcomes via PUMICE+.**
a–d illustrates the Manhattan plot for (a) COVID-A2, (b) COVID-B1, (c) COVID-B2, and (d) COVID-C2. Black horizontal line marks the genome-wide significance threshold at $2.5 \times 10^{- 6}$ (Bonferroni threshold corrected for 20,000 genes). The most significant genes at each phenotype-locus pair are labelled. Two-sided P value associated with each gene is calculated according to the TWAS Z-score for gene-based association test.

See this image and copyright information in PMC

Cited by

TWAS-GKF: a novel method for causal gene identification in transcriptome-wide association studies with knockoff inference.
Wang A, Tian P, Zhang YD. Wang A, et al. Bioinformatics. 2024 Aug 2;40(8):btae502. doi: 10.1093/bioinformatics/btae502. Bioinformatics. 2024. PMID: 39189955 Free PMC article.
FarmGTEx TWAS-server: An Interactive Web Server for Customized TWAS Analysis.
Zhang Z, Chen Z, Teng J, Liu S, Lin Q, Wu J, Gao Y, Bai Z; FarmGTEx Consortium; Li B, Liu G, Zhang Z, Pan Y, Zhang Z, Fang L, Wang Q. Zhang Z, et al. Genomics Proteomics Bioinformatics. 2025 May 10;23(1):qzaf006. doi: 10.1093/gpbjnl/qzaf006. Genomics Proteomics Bioinformatics. 2025. PMID: 39932890 Free PMC article.
SUMMIT-FA: a new resource for improved transcriptome imputation using functional annotations.
Melton HJ, Zhang Z, Wu C. Melton HJ, et al. Hum Mol Genet. 2024 Mar 20;33(7):624-635. doi: 10.1093/hmg/ddad205. Hum Mol Genet. 2024. PMID: 38129112 Free PMC article.
Integrating single cell expression quantitative trait loci summary statistics to understand complex trait risk genes.
Wang L, Khunsriraksakul C, Markus H, Chen D, Zhang F, Chen F, Zhan X, Carrel L, Liu DJ, Jiang B. Wang L, et al. Nat Commun. 2024 May 20;15(1):4260. doi: 10.1038/s41467-024-48143-1. Nat Commun. 2024. PMID: 38769300 Free PMC article.
Leveraging Random Effects in Cistrome-Wide Association Studies for Decoding the Genetic Determinants of Prostate Cancer.
Shao M, Tian M, Chen K, Jiang H, Zhang S, Li Z, Shen Y, Chen F, Shen B, Cao C, Gu N. Shao M, et al. Adv Sci (Weinh). 2024 Sep;11(36):e2400815. doi: 10.1002/advs.202400815. Epub 2024 Aug 5. Adv Sci (Weinh). 2024. PMID: 39099406 Free PMC article.

See all "Cited by" articles

References

1. Hamid J. S. et al. Data integration in genetics and genomics: methods and challenges. Hum. Genomics Proteomics2009, 869093 (2009). - PMC - PubMed
1. Lonsdale J, et al. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. - DOI - PMC - PubMed
1. Battle A, et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 2014;24:14–24. doi: 10.1101/gr.155192.113. - DOI - PMC - PubMed
1. Fromer M, et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 2016;19:1442–1453. doi: 10.1038/nn.4399. - DOI - PMC - PubMed
1. Lappalainen T, et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:506–511. doi: 10.1038/nature12531. - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Integrating 3D genomic and epigenomic data to enhance target gene discovery and drug repurposing in transcriptome-wide association studies

Affiliations

Integrating 3D genomic and epigenomic data to enhance target gene discovery and drug repurposing in transcriptome-wide association studies

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources