Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: Lessons from the Global Biobank Meta-analysis Initiative

Arjun Bhattacharya^{1

2

3

4}, Jibril B Hirbo^{5

6

3}, Dan Zhou^{5

6}, Wei Zhou^{7

8

9}, Jie Zheng¹⁰, Masahiro Kanai^{7

8

9

11

12}; Global Biobank Meta-analysis Initiative; Bogdan Pasaniuc^{1

13

14

3}, Eric R Gamazon^{5

6

15

3}, Nancy J Cox^{5

6

3}

Affiliations

¹ Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
² Institute of Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
³ These authors contributed equally.
⁴ Lead contact.
⁵ Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA.
⁶ Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.
⁷ Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
⁸ Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
⁹ Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
¹⁰ MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK.
¹¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
¹² Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan.
¹³ Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
¹⁴ Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
¹⁵ MRC Epidemiology Unit, University of Cambridge, Cambridge, UK.

PMID: 36341024
PMCID: PMC9631681
DOI: 10.1016/j.xgen.2022.100180

Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: Lessons from the Global Biobank Meta-analysis Initiative

Arjun Bhattacharya et al. Cell Genom. 2022.

. 2022 Oct 12;2(10):100180.

doi: 10.1016/j.xgen.2022.100180.

Authors

Affiliations

¹ Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
² Institute of Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
³ These authors contributed equally.
⁴ Lead contact.
⁵ Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA.
⁶ Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.
⁷ Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
⁸ Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
⁹ Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
¹⁰ MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK.
¹¹ Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
¹² Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan.
¹³ Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
¹⁴ Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
¹⁵ MRC Epidemiology Unit, University of Cambridge, Cambridge, UK.

PMID: 36341024
PMCID: PMC9631681
DOI: 10.1016/j.xgen.2022.100180

Abstract

The Global Biobank Meta-analysis Initiative (GBMI), through its diversity, provides a valuable opportunity to study population-wide and ancestry-specific genetic associations. However, with multiple ascertainment strategies and multi-ancestry study populations across biobanks, GBMI presents unique challenges in implementing statistical genetics methods. Transcriptome-wide association studies (TWASs) boost detection power for and provide biological context to genetic associations by integrating genetic variant-to-trait associations from genome-wide association studies (GWASs) with predictive models of gene expression. TWASs present unique challenges beyond GWASs, especially in a multi-biobank, meta-analytic setting. Here, we present the GBMI TWAS pipeline, outlining practical considerations for ancestry and tissue specificity, meta-analytic strategies, and open challenges at every step of the framework. We advise conducting ancestry-stratified TWASs using ancestry-specific expression models and meta-analyzing results using inverse-variance weighting, showing the least test statistic inflation. Our work provides a foundation for adding transcriptomic context to biobank-linked GWASs, allowing for ancestry-aware discovery to accelerate genomic medicine.

PubMed Disclaimer

Conflict of interest statement

DECLARATION OF INTERESTS The authors declare no competing interests.

Figures

**Figure 1**
Challenges in multi-ancestry, meta-analytic TWASs Each level of data in a TWAS introduces a set of challenges: (1) genetics data include confounding from genetic ancestry, population structure and relatedness, and complex linkage disequilibrium patterns, (2) gene expression data introduces context-specific factors, such as tissue-, cell-type-, or cell-state-specific expression, and (3) phenotypic data involve challenges in acquiring and aggregating phenotypes, properly defining controls for phenotypes, and ascertainment and selection bias from non-random sampling.

**Figure 2**
Comparison of predictive performance of expression prediction models across ancestry (A) Adjusted R² difference (y axis) when predicting expression in the AFR imputation sample between models trained in EUR and AFR training samples across tissue (x axis). Proportion of models with improved R² using ancestry-aligned models versus ancestry-mismatched models is labeled. (B) Adjusted R² difference between ancestry-specific and ancestry-unaware models imputing into EUR (left) and AFR (right) samples.

**Figure 3**
Comparison of meta-analytic strategies for multi-biobank, multi-ancestry TWASs (A) Per-ancestry meta-analyzed TWAS scores across EUR (x axis) versus AFR ancestry (y axis). The dotted lines indicate p < 2.5×10⁻⁶ with a 45-degree line for reference. Points are colored by which ancestry population the TWAS association meets p < 2.5×10⁻⁶. (B) QQ-plot of TWAS Z scores , colored by meta-analytic strategies. Per ancestry refers to TWAS meta-analysis across meta-analyzed ancestry-specific GWAS summary statistics. Per bank/per ancestry refers to TWAS meta-analysis using all biobank- and ancestry-specific GWAS summary statistics. (C) Effect sizes and Bonferroni-corrected confidence intervals (CIs) for TWAS associations across 17 individual biobanks (EUR in teal, AFR in red) and two IVW meta-analysis strategies (in yellow) for five representative genes. The Higgins-Thompson I² statistic is provided.

**Figure 4**
GReX-PheWAS for categorizing phenome-wide associations for TAF7 genetically regulated expression in UKBB (A) –log₁₀ Benjamini-Hochberg FDR-adjusted p values of GTAs (y axis) across nine phenotype groups (x axis). Dotted line shows FDR-adjusted p = 0.05. (B) Miami plot of TWAS Z scores (y axis) across phenotypes (x axis), colored by phecode group. Dotted line shows Benjamini-Hochberg FDR-corrected significance, and phenotypes are labeled if the association passes Bonferroni correction.

See this image and copyright information in PMC

References

1. Abul-Husn N.S., Kenny E.E. Personalized medicine and the power of electronic health records. Cell. 2019;177:58–69. - PMC - PubMed
1. Swede H., Stone C.L., Norwood A.R. National population-based biobanks for genetic research. Genet. Med. 2007;9:141–149. - PubMed
1. Zhou W., Kanai M., Wu K.-H.H., Humaira R., Tsuo K., Hirbo J.B., Wang Y., Bhattacharya A., Zhao H., Namba S., et al. Global Biobank Meta-analysis Initiative: powering genetic discovery across human diseases. medRxiv. 2021;27 doi: 10.1101/2021.11.19.21266436. Preprint at. - DOI - PMC - PubMed
1. Gallagher M.D., Chen-Plotkin A.S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 2018;102:717–730. - PMC - PubMed
1. Wijmenga C., Zhernakova A. The importance of cohort studies in the post-GWAS era. Nat. Genet. 2018;50:322–328. - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: Lessons from the Global Biobank Meta-analysis Initiative

Affiliations

Best practices for multi-ancestry, meta-analytic transcriptome-wide association studies: Lessons from the Global Biobank Meta-analysis Initiative

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources