Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 May 20:2023.02.09.527747.
doi: 10.1101/2023.02.09.527747.

Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations

Affiliations

Multivariate adaptive shrinkage improves cross-population transcriptome prediction for transcriptome-wide association studies in underrepresented populations

Daniel S Araujo et al. bioRxiv. .

Update in

  • This article has been published with doi: 10.1016/j.xhgg.2023.100216

Abstract

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized methods that leverage shared regulatory effects across different conditions, in this case, across different populations may improve cross-population transcriptome prediction. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWAS) using different methods (Elastic Net, Joint-Tissue Imputation (JTI), Matrix eQTL, Multivariate Adaptive Shrinkage in R (MASHR), and Transcriptome-Integrated Genetic Association Resource (TIGAR)) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in TWAS, we integrated publicly available multi-ethnic genome-wide association study (GWAS) summary statistics from the Population Architecture using Genomics and Epidemiology Study (PAGE) and Pan-UK Biobank with our developed transcriptome prediction models. In regard to transcriptome prediction accuracy, MASHR models performed better or the same as other methods in both population-matched and cross-population transcriptome predictions. Furthermore, in multi-ethnic TWAS, MASHR models yielded more discoveries that replicate in both PAGE and PanUKBB across all methods analyzed, including loci previously mapped in GWAS and new loci previously not found in GWAS. Overall, our study demonstrates the importance of using methods that benefit from different populations' effect size estimates in order to improve TWAS for multi-ethnic or underrepresented populations.

Keywords: genetics; genomics; human genetics; transcriptome-wide association studies.

PubMed Disclaimer

Conflict of interest statement

7.DECLARATION OF INTERESTS All authors declare that they have no conflicts of interest.

Figures

Figure 1:
Figure 1:. Overall study methodology.
Using TOPMed MESA as a training dataset, we built population-based transcriptome prediction models using five different methods (Elastic Net (EN), Joint-Tissue Imputation (JTI), Multivariate adaptive shrinkage (MASHR), Matrix eQTL, and Transcriptome-Integrated Genetic Association Resource (TIGAR)). With these transcriptome models, we evaluated their out-of-sample transcriptome prediction accuracy using the GEUVADIS dataset. Additionally, we assessed their applicability in multi-ethnic TWAS using GWAS summary statistics from the PAGE Study and PanUKBB. AFA = African American, CHN = Chinese, EUR = European, HIS = Hispanic/Latino.
Figure 2:
Figure 2:. Design of the methodology implemented to make MASHR models.
(A) Using effect sizes estimated using Matrix eQTL within each population dataset, we combined them across genes, with the different populations as conditions, to use as input for MASHR. The output matrixes contain adjusted effect sizes. (B) For each population, we selected the top SNP (lowest local false sign rate) per gene. Then, we concatenated the Gene-top SNP pairs across populations to determine which SNPs would end up in the final models. Lastly, to make our population-based transcriptome prediction models, we used population-specific effect sizes, taken from the corresponding MASHR output matrices. AFA = African American, CHN = Chinese, EUR = European, HIS = Hispanic/Latino.
Figure 3:
Figure 3:. PBMC gene expression cis-heritability estimates across MESA populations.
(A) Gene expression cis-heritability (h²) estimated for different genes across different MESA population datasets in PBMC. Only genes with significant estimated h² (p-value < 0.05) are shown. Gray bars represent the standard errors (2*S.E.). Genes are ordered on the x-axis in ascending h² order, and colored according to the h² lower bound (h² - 2*S.E.). (B) Number of significant heritable genes (p-value < 0.05 and h² lower bound > 0.01) within each PBMC population dataset, by sample size. AFA = African American, CHN = Chinese, EUR = European, HIS = Hispanic/Latino.
Figure 4:
Figure 4:. Comparison of MESA population transcriptome prediction models.
(A) The number of genes in each MESA population model, by method and tissue. (B) Prediction performance (Spearman’s rho) of EN, JTI, MASHR, MatrixeQTL, and TIGAR PBMC MESA population models in Geuvadis GBR and YRI populations. Only the intersection of genes with expression predicted by all methods for each MESA-Geuvadis population pair are shown. MASHR performed better than or the same as all other methods (see Table S2 for all pairwise comparisons).
Figure 5:
Figure 5:. Number of significant S-PrediXcan gene-trait pairs in PAGE and PanUKBB GWAS summary statistics.
(A) Total number of significant gene-trait pairs discovered by each MESA population model (considering the union of the three tissues), by method. (B) Number of significant gene-trait pairs discovered with individual or multiple MESA populations colored by method (considering the union of the three tissues). Population set intersections are indicated on the x-axis in color.

References

    1. Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E., et al. (2019). The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Research 47, D1005–D1012. 10.1093/nar/gky1120. - DOI - PMC - PubMed
    1. Morales J., Welter D., Bowler E.H., Cerezo M., Harris L.W., McMahon A.C., Hall P., Junkins H.A., Milano A., Hastings E., et al. (2018). A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog. Genome Biol 19, 21. 10.1186/s13059-018-1396-2. - DOI - PMC - PubMed
    1. Martin A.R., Kanai M., Kamatani Y., Okada Y., Neale B.M., and Daly M.J. (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 51, 584–591. 10.1038/s41588-019-0379-x. - DOI - PMC - PubMed
    1. Taliun D., Harris D.N., Kessler M.D., Carlson J., Szpiech Z.A., Torres R., Taliun S.A.G., Corvelo A., Gogarten S.M., Kang H.M., et al. (2021). Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299. 10.1038/s41586-021-03205-y. - DOI - PMC - PubMed
    1. Wojcik G.L., Graff M., Nishimura K.K., Tao R., Haessler J., Gignoux C.R., Highland H.M., Patel Y.M., Sorokin E.P., Avery C.L., et al. (2019). Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518. 10.1038/s41586-019-1310-4. - DOI - PMC - PubMed

Publication types

Grants and funding