Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018:23:448-459.

Evaluation of PrediXcan for prioritizing GWAS associations and predicting gene expression

Affiliations

Evaluation of PrediXcan for prioritizing GWAS associations and predicting gene expression

Binglan Li et al. Pac Symp Biocomput. 2018.

Abstract

Genome-wide association studies (GWAS) have been successful in facilitating the understanding of genetic architecture behind human diseases, but this approach faces many challenges. To identify disease-related loci with modest to weak effect size, GWAS requires very large sample sizes, which can be computational burdensome. In addition, the interpretation of discovered associations remains difficult. PrediXcan was developed to help address these issues. With built in SNP-expression models, PrediXcan is able to predict the expression of genes that are regulated by putative expression quantitative trait loci (eQTLs), and these predicted expression levels can then be used to perform gene-based association studies. This approach reduces the multiple testing burden from millions of variants down to several thousand genes. But most importantly, the identified associations can reveal the genes that are under regulation of eQTLs and consequently involved in disease pathogenesis. In this study, two of the most practical functions of PrediXcan were tested: 1) predicting gene expression, and 2) prioritizing GWAS results. We tested the prediction accuracy of PrediXcan by comparing the predicted and observed gene expression levels, and also looked into some potential influential factors and a filter criterion with the aim of improving PrediXcan performance. As for GWAS prioritization, predicted gene expression levels were used to obtain gene-trait associations, and background regions of significant associations were examined to decrease the likelihood of false positives. Our results showed that 1) PrediXcan predicted gene expression levels accurately for some but not all genes; 2) including more putative eQTLs into prediction did not improve the prediction accuracy; and 3) integrating predicted gene expression levels from the two PrediXcan whole blood models did not eliminate false positives. Still, PrediXcan was able to prioritize GWAS associations that were below the genome-wide significance threshold in GWAS, while retaining GWAS significant results. This study suggests several ways to consider PrediXcan's performance that will be of value to eQTL and complex human disease research.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Prediction performance of DGN (A) and GTEx (B) whole blood tissue model on the YRI cohort. DGN and GTEx whole blood tissue models were applied to the genotypic data from the YRI cohort. Prediction accuracy (R2 of predicted versus observed gene expression levels; green) was compared to the narrow-sense heritability (h2) estimates (black).
Fig. 2
Fig. 2
Examples of well-predicted genes. These plots show the top four performing genes based on PrediXcan’s prediction accuracy. Predicted gene expression levels were generated using the DGN whole blood model. Observed expression levels (in RPKM) for the YRI cohort were provided the 1000 Genome Project.
Fig. 3
Fig. 3
Performance of prediction directionality of PrediXcan models, DGN (top) and GTEx (bottom), on the YRI cohort. Directionality was computed between predicted and observed gene expression levels.
Fig. 4
Fig. 4
Prediction accuracy has weak relationship to the model properties. R2 was computed between observed and GTEx whole blood model predicted expressions. A few genotype-expression model properties were explored, including the number (A) and the percentage (B) of model variants used for prediction, and the number of used model variants adjusted to gene length (C). But neither of them explained the unsatisfactory prediction, nor could be used as a filtering criterion.
Fig. 5
Fig. 5
Prediction similarity between two models has weak, if any, indication on prediction accuracy. Prediction similarity was measured by the Pearson correlation of predicted expressions between the DGN and the GTEx model. (A) Distribution of prediction similarity. (B) Indication of prediction similarity on prediction accuracy. Prediction accuracy slightly, if any, increases when prediction similarity increases from the lowest to the highest.
Fig. 6
Fig. 6
PrediXcan is able to prioritize GWAS associations. ACTG A5202 imputed genotypic data after quality control was used as input for PrediXcan using GTEx whole blood model and followed by phenome-wide TWAS. Variants within 1MB upstream or downstream of PrediXcan-TWAS significant genes were used to carry out PheWAS. The figures showed the comparison of p-values between PrediXcan-TWAS associations (green line; grey shaded areas represent the size of genes) and PheWAS associations (black dots; blue and red lines denote the suggestive and genome-wide significant p-value, respectively). (A) PrediXcan-TWAS was able to replicate PheWAS results. (B) PrediXcan was able to prioritize non-significant PheWAS results.

References

    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, … Cho JH. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–753. - PMC - PubMed
    1. Van Steen K. Travelling the world of gene–gene interactions. Briefings in bioinformatics. 2011;13(1):1–19. - PubMed
    1. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, … Cherry JM. Annotation of functional variation in personal genomes using RegulomeDB. Genome research. 2012;22(9):1790–1797. - PMC - PubMed
    1. Portela A, Esteller M. Epigenetic modifications and human disease. Nature biotechnology. 2010;28(10):1057–1068. - PubMed
    1. Battle A, Khan Z, Wang SH, Mitrano A, Ford MJ, Pritchard JK, Gilad Y. Impact of regulatory variation from RNA to protein. Science. 2015;347(6222):664–667. - PMC - PubMed

Publication types

MeSH terms

Grants and funding