Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov;27(11):1872-1884.
doi: 10.1101/gr.216747.116. Epub 2017 Oct 11.

Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change

Affiliations

Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change

Pejman Mohammadi et al. Genome Res. 2017 Nov.

Abstract

Mapping cis-acting expression quantitative trait loci (cis-eQTL) has become a popular approach for characterizing proximal genetic regulatory variants. In this paper, we describe and characterize log allelic fold change (aFC), the magnitude of expression change associated with a given genetic variant, as a biologically interpretable unit for quantifying the effect size of cis-eQTLs and a mathematically convenient approach for systematic modeling of cis-regulation. This measure is mathematically independent from expression level and allele frequency, additive, applicable to multiallelic variants, and generalizable to multiple independent variants. We provide efficient tools and guidelines for estimating aFC from both eQTL and allelic expression data sets and apply it to Genotype Tissue Expression (GTEx) data. We show that aFC estimates independently derived from eQTL and allelic expression data are highly consistent, and identify technical and biological correlates of eQTL effect size. We generalize aFC to analyze genes with two eQTLs in GTEx and show that in nearly all cases the two eQTLs act independently in regulating gene expression. In summary, aFC is a solid measure of cis-regulatory effect size that allows quantitative interpretation of cellular regulatory events from population data, and it is a valuable approach for investigating novel aspects of eQTL data sets.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Schematic representation of cis-regulatory eQTL model in Equations 1 and 2. (B) Example of allelic expression associated with each of the alleles of a cis-eQTL (eVariant Chr 5: 96252589 T/C; eGene ERAP2) in GTEx adipose subcutaneous. Each dot corresponds to allelic imbalance in one individual heterozygous for the eVariant, measured using reads that overlap heterozygous SNPs (aeSNP) in the eGene. Phasing between the aeSNP and the eQTL SNP is utilized to associate the measured allelic expression with each of the eQTL alleles. (C,D) eGene expression for the same example eQTL. The green dashed line connects the median expression of the two homozygous classes. Expression is linear with number of alternative alleles (C), but the linearity is lost after log transformation (D).
Figure 2.
Figure 2.
Comparison of the aFC estimation methods using simulated data. We simulated 10,000 eQTLs with noise (40% coefficient of variation), and uniformly selected log2 aFC (range: [−5,5]), and reference allele frequency (range: [0,1]). (A) True aFC used in simulation versus identified values using linear model (M1), nonlinear model (M2), and the nonlinear model approximation (M3). At this level of noise, M2 performed the best, with M1 and M3 having RMSDs of 164% and 110% of M2. (B) Quality of the effect size estimates as a function of allele frequency and the true effect size, evaluated by average error relative to the true log2 aFC. All three estimates, and particularly M1, deteriorate when the lower expressed allele is the minor allele. (C,D) Schematic representation of the nonlinear model approximation method (Box 3) based on four different candidate estimates (C), and the selected estimate with minimum residual variance for each simulated eQTL as a function of reference allele frequency and the true aFC (D).
Figure 3.
Figure 3.
Comparison of the methods for estimating aFC using GTEx data. (A) aFC as estimated from ASE data versus estimates from eQTL data using linear model (M1), nonlinear model (M2), and the nonlinear model (M3) approximation for all top eQTLs in adipose subcutaneous. All three estimates are ∼75% correlated with estimates from ASE data. (B) Quality of the eQTL estimates as a function of allele frequency and the aFC estimate from allelic expression data, evaluated by average relative error between aFC from ASE data and from eQTL estimates. (C) Concordance between the estimates from allelic expression and eQTL data as evaluated by RMSD between the most accurate method, M2, and the other two methods. Each dot represents one tissue in GTEx. (D) Concordance between the estimates from ASE and eQTL data as evaluated by RMSD, comparing M3 to M3 applied after quantile normalization within each genotype group. Each dot represents one tissue in GTEx.
Figure 4.
Figure 4.
aFC compared with linear regression slope. (AC) Slope of linear regression from 10,000 simulated eQTLs generated similarly to data shown in Figure 2. The true aFC value is compared with regression slopes from raw (A), z-scored (B), and log2 transformed (C) data. The color code represents median eGene expression (A) and reference allele frequency (B,C) with alternative color-coding for the same plots in Supplemental Figure S4. (DF) Regression slope compared with aFC values estimated using GTEx eQTLs data from adipose subcutaneous.
Figure 5.
Figure 5.
Empirical properties of the aFC distributions in GTEx data. All aFC values are calculated with the nonlinear approximation method (M3). (A) Distribution of absolute log2 aFC across tissues as a function of sample size. Each point represents a tissue in GTEx data, and 90%, 50%, and 10% quantiles of absolute aFC across a tissue are shown. (B,C) Correlation of log2 aFC estimates (B), and the ratio of the estimates (C) derived from eQTL and ASE data. Each point corresponds to one GTEx tissue. (D) Difference between the aFC estimates from allelic expression (sASE) and eQTL (seQTL) as a function of absolute average aFC (|sASE+ seQTL|/2), with H and L referring to higher and lower expressed alleles of each eQTL in adipose subcutaneous, respectively. Estimated effect size form ASE data tend to be smaller in weak eQTLs and larger for stronger eQTLs as compared to those derived using eQTL data. (E–H) Distribution of absolute log2 aFCs calculated from GTEx adipose subcutaneous as function of minor allele frequency (E), gene expression level (F), number of tissues where the gene is expressed >0.1 RPKM in 10 or more individuals (G), and logistic-transformed RVIS, a measure of each gene's tolerance to variation in the coding region (H) (Petrovski et al. 2013). Red line shows fit by robust locally weighted scatterplot smoothing.
Figure 6.
Figure 6.
Joint analysis of aFCs for GTEx eGenes with two eQTLs. (A) An example of relative expression of eGene ZC3H3 and the model fits for different genotype groups of its two eQTLs (eVariant1: Chr 8: 144633728 A/G; eVariant2: Chr 8: 144556836 G/A) in GTEx adipose subcutaneous. The effect size of the first and the second eQTLs are −0.77 and −0.14 as measured by log2 aFC. Each dot represents observed expression in one individual, scaled relative to the expression at all-reference genotype. The blue bars show model fits from the two-eQTL model based on regulatory independence assumption. Reference and alternative alleles are denoted by 0 and 1, respectively, and haplotypes are separated by “|” sign (e.g., 10|11 corresponds to the cases that one haplotype carries alternative and reference alleles of eVariant1 and eVariant2, respectively, and the other haplotype carries the alternative allele of both eVariants). (B) Expression of the second haplotype relative to the first haplotype, observed in ASE data. The red bars show expected haplotype expression ratios based on the model in panel A, learned on the eQTL data. (C) aFC between two haplotypes as predicted from eQTL data compared with median aFC observed in ASE data for all eGenes with two eQTLs in adipose subcutaneous. Each dot represents one randomly selected genotype for one eGene. Red line indicates the robust linear fit (y = 0.9x + 0.002). (D) Predicted and observed median aFC for all eGenes with two eQTLs calculated from eQTL and ASE data, respectively, in each tissue with more than 200 eGenes with two eQTLs. (E) cis-Regulatory effect size associated with co-occurrence of the alternative alleles of the two eQTLs, as predicted under regulatory independence model or learned using the relaxed model. (F) Percentage of the two eQTLs that are not well described using the independent regulatory assumption across all tissues with more than 200 eGenes with two eQTLs.

Comment in

References

    1. Albert FW, Kruglyak L. 2015. The role of regulatory variation in complex traits and disease. Nat Rev Genet 16: 197–212. - PubMed
    1. Anders S, Huber W. 2010. Differential expression analysis for sequence count data. Genome Biol 11: R106. - PMC - PubMed
    1. Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A. 2013. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339: 1074–1077. - PubMed
    1. Battle A, Mostafavi S, Zhu X, Potash JB, Weissman MM, McCormick C, Haudenschild CD, Beckman KB, Shi J, Mei R, et al. 2014. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res 24: 14–24. - PMC - PubMed
    1. Battle A, Khan Z, Wang SH, Mitrano A, Ford MJ, Pritchard JK, Gilad Y. 2015. Impact of regulatory variation from RNA to protein. Science 347: 664–667. - PMC - PubMed

Publication types