This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2023 May 20:2023.02.09.527747.

doi: 10.1101/2023.02.09.527747.

Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations

Daniel S Araujo¹, Chris Nguyen², Xiaowei Hu³, Anna V Mikhaylova⁴, Chris Gignoux⁵, Kristin Ardlie⁶, Kent D Taylor⁷, Peter Durda⁸, Yongmei Liu⁹, George Papanicolaou¹⁰, Michael H Cho¹¹, Stephen S Rich³, Jerome I Rotter⁷; NHLBI TOPMed Consortium; Hae Kyung Im¹², Ani Manichaikul³, Heather E Wheeler^{1

2}

Affiliations

¹ Program in Bioinformatics, Loyola University Chicago, Chicago, IL, 60660, USA.
² Department of Biology, Loyola University Chicago, Chicago, IL, 60660, USA.
³ Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA.
⁴ Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA.
⁵ Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO, 80045, USA.
⁶ Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
⁷ The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA.
⁸ Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT, 05446, USA.
⁹ Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA.
¹⁰ Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD, 20892, USA.
¹¹ Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA.
¹² Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA.

PMID: 36798214
PMCID: PMC9934635
DOI: 10.1101/2023.02.09.527747

Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations

Daniel S Araujo et al. bioRxiv. 2023.

[Preprint]. 2023 May 20:2023.02.09.527747.

doi: 10.1101/2023.02.09.527747.

Authors

Affiliations

¹ Program in Bioinformatics, Loyola University Chicago, Chicago, IL, 60660, USA.
² Department of Biology, Loyola University Chicago, Chicago, IL, 60660, USA.
³ Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, 22908, USA.
⁴ Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA.
⁵ Division of Biomedical Informatics and Personalized Medicine, Department of Medicine, UC Denver Anschutz Medical Campus, Aurora, CO, 80045, USA.
⁶ Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
⁷ The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA.
⁸ Laboratory for Clinical Biochemistry Research, University of Vermont, Colchester, VT, 05446, USA.
⁹ Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA.
¹⁰ Epidemiology Branch, Division of Cardiovascular Sciences, National Heart, Lung and Blood Institute, Bethesda, MD, 20892, USA.
¹¹ Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, 02115, USA.
¹² Section of Genetic Medicine, The University of Chicago, Chicago, IL, 60637, USA.

PMID: 36798214
PMCID: PMC9934635
DOI: 10.1101/2023.02.09.527747

Update in

This article has been published with doi: 10.1016/j.xhgg.2023.100216

Abstract

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized methods that leverage shared regulatory effects across different conditions, in this case, across different populations may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWAS) using different methods (Elastic Net, Joint-Tissue Imputation (JTI), Matrix eQTL, Multivariate Adaptive Shrinkage in R (MASHR), and Transcriptome-Integrated Genetic Association Resource (TIGAR)) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWAS, we integrated publicly available multi-ethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology Study (PAGE) and Pan-UK Biobank with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multi-ethnic TWAS, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWAS and new loci previously not found in GWAS. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWAS for multi-ethnic or underrepresented populations.

Keywords: genetics; genomics; human genetics; transcriptome-wide association studies.

PubMed Disclaimer

Conflict of interest statement

7.DECLARATION OF INTERESTS All authors declare that they have no conflicts of interest.

Figures

**Figure 1:. Overall study methodology.**
Using TOPMed MESA as a training dataset, we built population-based transcriptome prediction models using five different methods (Elastic Net (EN), Joint-Tissue Imputation (JTI), Multivariate adaptive shrinkage (MASHR), Matrix eQTL, and Transcriptome-Integrated Genetic Association Resource (TIGAR)). With these transcriptome models, we evaluated their out-of-sample transcriptome prediction accuracy using the GEUVADIS dataset. Additionally, we assessed their applicability in multi-ethnic TWAS using GWAS summary statistics from the PAGE Study and PanUKBB. AFA = African American, CHN = Chinese, EUR = European, HIS = Hispanic/Latino.

**Figure 2:. Design of the methodology implemented to make MASHR models.**
(A) Using effect sizes estimated using Matrix eQTL within each population dataset, we combined them across genes, with the different populations as conditions, to use as input for MASHR. The output matrixes contain adjusted effect sizes. (B) For each population, we selected the top SNP (lowest local false sign rate) per gene. Then, we concatenated the Gene-top SNP pairs across populations to determine which SNPs would end up in the final models. Lastly, to make our population-based transcriptome prediction models, we used population-specific effect sizes, taken from the corresponding MASHR output matrices. AFA = African American, CHN = Chinese, EUR = European, HIS = Hispanic/Latino.

**Figure 3:. PBMC gene expression cis-heritability estimates across MESA populations.**
(A) Gene expression cis-heritability (h²) estimated for different genes across different MESA population datasets in PBMC. Only genes with significant estimated h² (p-value < 0.05) are shown. Gray bars represent the standard errors (2*S.E.). Genes are ordered on the x-axis in ascending h² order, and colored according to the h² lower bound (h² - 2*S.E.). (B) Number of significant heritable genes (p-value < 0.05 and h² lower bound > 0.01) within each PBMC population dataset, by sample size. AFA = African American, CHN = Chinese, EUR = European, HIS = Hispanic/Latino.

**Figure 4:. Comparison of MESA population transcriptome prediction models.**
(A) The number of genes in each MESA population model, by method and tissue. (B) Prediction performance (Spearman’s rho) of EN, JTI, MASHR, MatrixeQTL, and TIGAR PBMC MESA population models in Geuvadis GBR and YRI populations. Only the intersection of genes with expression predicted by all methods for each MESA-Geuvadis population pair are shown. MASHR performed better than or the same as all other methods (see Table S2 for all pairwise comparisons).

**Figure 5:. Number of significant S-PrediXcan gene-trait pairs in PAGE and PanUKBB GWAS summary statistics.**
(A) Total number of significant gene-trait pairs discovered by each MESA population model (considering the union of the three tissues), by method. (B) Number of significant gene-trait pairs discovered with individual or multiple MESA populations colored by method (considering the union of the three tissues). Population set intersections are indicated on the x-axis in color.

See this image and copyright information in PMC

References

1. Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E., et al. (2019). The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Research 47, D1005–D1012. 10.1093/nar/gky1120. - DOI - PMC - PubMed
1. Morales J., Welter D., Bowler E.H., Cerezo M., Harris L.W., McMahon A.C., Hall P., Junkins H.A., Milano A., Hastings E., et al. (2018). A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol 19, 21. 10.1186/s13059-018-1396-2. - DOI - PMC - PubMed
1. Martin A.R., Kanai M., Kamatani Y., Okada Y., Neale B.M., and Daly M.J. (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 51, 584–591. 10.1038/s41588-019-0379-x. - DOI - PMC - PubMed
1. Taliun D., Harris D.N., Kessler M.D., Carlson J., Szpiech Z.A., Torres R., Taliun S.A.G., Corvelo A., Gogarten S.M., Kang H.M., et al. (2021). Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299. 10.1038/s41586-021-03205-y. - DOI - PMC - PubMed
1. Wojcik G.L., Graff M., Nishimura K.K., Tao R., Haessler J., Gignoux C.R., Highland H.M., Patel Y.M., Sorokin E.P., Avery C.L., et al. (2019). Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518. 10.1038/s41586-019-1310-4. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations

Affiliations

Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations

Authors

Affiliations

Update in

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources