Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Dec;207(4):1301-1312.
doi: 10.1534/genetics.117.300435. Epub 2017 Oct 26.

Efficient Prioritization of Multiple Causal eQTL Variants via Sparse Polygenic Modeling

Affiliations

Efficient Prioritization of Multiple Causal eQTL Variants via Sparse Polygenic Modeling

Naoki Nariai et al. Genetics. 2017 Dec.

Abstract

Expression quantitative trait loci (eQTL) studies have typically used single-variant association analysis to identify genetic variants correlated with gene expression. However, this approach has several drawbacks: causal variants cannot be distinguished from nonfunctional variants in strong linkage disequilibrium, combined effects from multiple causal variants cannot be captured, and low-frequency (<5% MAF) eQTL variants are difficult to identify. While these issues possibly could be overcome by using sparse polygenic models, which associate multiple genetic variants with gene expression simultaneously, the predictive performance of these models for eQTL studies has not been evaluated. Here, we assessed the ability of three sparse polygenic models (Lasso, Elastic Net, and BSLMM) to identify causal variants, and compared their efficacy to single-variant association analysis and a fine-mapping model. Using simulated data, we determined that, while these methods performed similarly when there was one causal SNP present at a gene, BSLMM substantially outperformed single-variant association analysis for prioritizing causal eQTL variants when multiple causal eQTL variants were present (1.6- to 5.2-fold higher recall at 20% precision), and identified up to 2.3-fold more low frequency variants as the top eQTL SNP. Analysis of real RNA-seq and whole-genome sequencing data of 131 iPSC samples showed that the eQTL SNPs identified by BSLMM had a higher functional enrichment in DHS sites and were more often low-frequency than those identified with single-variant association analysis. Our study showed that BSLMM is a more effective approach than single-variant association analysis for prioritizing multiple causal eQTL variants at a single gene.

Keywords: causal variants; eQTLs; sparse polygenic models.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Prediction performance for identifying causal eQTL variants from simulation data of 503 samples with 60% heritability. PR curves parametrized by the number of highest ranked eQTL SNPs (ranging from 1 to 20) at 1000 randomly selected genes. (A) One causal eQTL variant per gene. (B) Two causal eQTL variants per gene. (C) Five causal eQTL variants per gene. (D) Ten causal eQTL variants per gene.
Figure 2
Figure 2
eQTL variant discovery from 131 iPSC samples with BSLMM, Elastic Net, Lasso, and single-variant association analysis. MAF spectrum of candidate eQTL SNPs identified with BSLMM, Elastic Net, Lasso, and single-variant association analysis for (A) genes with only one eQTL, and (B) genes with more than one independent eQTL. Enrichment of the identified eQTL SNPs, with varying ranked thresholds (from 1 to 20 per gene), in DHSs for (C) genes with only one eQTL, and (D) genes with more than one independent eQTL. Deleteriousness of the identified eQTL variants measured by CADD score for (E) genes with only one eQTL, and (F) genes with more than one independent eQTL.
Figure 3
Figure 3
Identification of genes with heritable expression levels. Genes ranked based on the significance level of the highest ranked eQTL SNP. The x-axis shows the ranking of genes, and the y-axis shows the narrow-sense heritability estimated with BSLMM. Genes with more than one independent eQTL (orange squares) tend to have higher heritability than those with only one eQTL (black circles).
Figure 4
Figure 4
eQTL variants identified associated with OCT4 expression. Variants are color-coded based on the strength of LD with the most highly associated eQTL (purple diamond). (A) BSLMM ranked eQTL SNPs with varying effect sizes as candidate eQTL variants including chr6:31139490 and chr6:31133509. (B) Single-variant association analysis identified a SNP located on chr6:31132649 as the most significantly associated eQTL SNP, whereas the eQTL SNP located on chr6:31139490 was identified as the sixth significantly associated variant. (C) Genomic regions annotated with H1-hESC OCT4 and NANOG binding site, iPSC histone marks (H3K4me3, H3K4me1, and H3K27ac), and iPSC DHSs. (D) Genomic coordinates of OCT4 and surrounding genes in hg19.
Figure 5
Figure 5
eQTL variants identified as associated with CXCL5 expression. Variants are color-coded based on the strength of LD with the most highly associated eQTL (purple diamond). (A) BSLMM prioritized six eQTL SNPs, including chr4:74863997, and chr4:74864687 which are in a DHS. (B) Single-variant association analysis identified the eQTL SNP located on chr4:74857970 as the most significantly associated variant. (C) Genomic regions annotated with iPSC histone marks (H3K4me3 and H3K4me1), and iPSC DHSs. (D) Genomic coordinates of CXCL5 and surrounding genes in hg19.
Figure 6
Figure 6
Comparison of eQTL variant discovery from WGS with simulated SNP array data. MAF spectrum of candidate eQTL SNPs identified with BSLMM or single-variant association analysis, from either from WGS or synthetic SNP array data, for: (A) genes with only one eQTL, and (B) genes with more than one independent eQTL. Enrichment of ranked eQTL variants in DHSs for (C) genes with only one eQTL, and for (D) genes with more than one independent eQTL. Deleteriousness of the identified eQTL variants measured by CADD score for (E) genes with only one eQTL, and for (F) genes with more than one independent eQTL.

Similar articles

Cited by

References

    1. Auton A., Brooks L. D., Durbin R. M., Garrison E. P., Kang H. M., et al. , 2015. A global reference for human genetic variation. Nature 526: 68–74. - PMC - PubMed
    1. Battle A., Mostafavi S., Zhu X., Potash J. B., Weissman M. M., et al. , 2014. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24: 14–24. - PMC - PubMed
    1. Bulik-Sullivan B. K., Loh P. R., Finucane H. K., Ripke S., Yang J., et al. , 2015. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47: 291–295. - PMC - PubMed
    1. Cheng W., Shi Y., Zhang X., Wang W., 2016. Sparse regression models for unraveling group and individual associations in eQTL mapping. BMC Bioinformatics 17: 136. - PMC - PubMed
    1. Chiang C., Scott A. J., Davis J. R., Tsang E. K., Li X., et al. , 2017. The impact of structural variation on human gene expression. Nat. Genet. 49: 692–699. - PMC - PubMed

Publication types

MeSH terms