Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 8;23(1):169.
doi: 10.1186/s12859-022-04706-x.

An optimal variant to gene distance window derived from an empirical definition of cis and trans protein QTLs

Affiliations

An optimal variant to gene distance window derived from an empirical definition of cis and trans protein QTLs

Eric B Fauman et al. BMC Bioinformatics. .

Abstract

Background: A genome-wide association study (GWAS) correlates variation in the genotype with variation in the phenotype across a cohort, but the causal gene mediating that impact is often unclear. When the phenotype is protein abundance, a reasonable hypothesis is that the gene encoding that protein is the causal gene. However, as variants impacting protein levels can occur thousands or even millions of base pairs from the gene encoding the protein, it is unclear at what distance this simple hypothesis breaks down.

Results: By making the simple assumption that cis-pQTLs should be distance dependent while trans-pQTLs are distance independent, we arrive at a simple and empirical distance cutoff separating cis- and trans-pQTLs. Analyzing a recent large-scale pQTL study (Pietzner in Science 374:eabj1541, 2021) we arrive at an estimated distance cutoff of 944 kilobasepairs (95% confidence interval: 767-1,161) separating the cis and trans regimes.

Conclusions: We demonstrate that this simple model can be applied to other molecular GWAS traits. Since much of biology is built on molecular traits like protein, transcript and metabolite abundance, we posit that the mathematical models for cis and trans distance distributions derived here will also apply to more complex phenotypes and traits.

Keywords: Cis-eQTL; Cis-pQTL; GWAS; Trans-eQTL; Trans-pQTL; Weibull; eQTL; metQTL; pQTL.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they are full time employees of Pfizer, Inc.

Figures

Fig. 1
Fig. 1
Histogram of log base 10 of the distance from lead SNP for GWAS of protein abundance to the transcription start site (TSS) of the cognate gene for that protein, for 2051 unique proteins from the study of Pietzner, et al. Four bins are used for each log unit. Solid red line represents the best fit Weibull distribution curve fit to all data points below 105.75. Solid blue line represents best fit random distribution curve fit to all pQTLs with a distance beyond above 106 base pairs. Dashed purple line represents best combined model starting from the parameters estimated for the initial Weibull curve and adding a Weibull fraction parameter to add the Weibull curve and the trans model curve
Fig. 2
Fig. 2
Histogram of distance from lead SNP for GWAS of protein abundance to the transcription start site (TSS) of the cognate gene for that protein, for 1604 unique proteins where the distance is less than 500 kb (bin size = 10 kb), with the curve fit to our global model which includes a Weibull curve and our trans model. The Weibull model dominates a t distances less than 1,000,000 base pairs
Fig. 3
Fig. 3
A post-hoc rationale for the Weibull distribution. According to the ABC model [18] of gene activation and models of chromatin compaction [14, 15], the chance that a particular enhancer (E1-E4) is in contact with the promoter of a particular gene (“Gene”) is proportional to distance−γ (that is, distance to the power -γ, where γ has a value of about 1) from the enhancer to the promoter. In a scenario where all enhancers are equally active, a particular gene will be most strongly influenced by the closest enhancer (E2 in this figure). A Weibull model, as observed empirically in this analysis, can result from such a “superposition” of power-law distributions [17]
Fig. 4
Fig. 4
Histogram of distance from lead SNP for GWAS of protein abundance to the transcription start site (TSS) of the cognate gene for that protein, for 349 unique proteins where the distance is greater than 10 megabases (bin size = 10 megabases), with the curve fit to our global model which includes a Weibull curve and our trans model. The trans model dominates at distances past 1 megabase

Similar articles

Cited by

References

    1. Folkersen L, Fauman E, Sabater-Lleal M, Strawbridge RJ, Frånberg M, Sennblad B, et al. Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLoS Genet. 2017;13:e1006706. doi: 10.1371/journal.pgen.1006706. - DOI - PMC - PubMed
    1. Folkersen L, Gustafsson S, Wang Q, Hansen DH, Hedman ÅK, Schork A, et al. Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals. Nat Metab. 2020;2:1135–1148. doi: 10.1038/s42255-020-00287-2. - DOI - PMC - PubMed
    1. Ferkingstad E, Sulem P, Atlason BA, Sveinbjornsson G, Magnusson MI, Styrmisdottir EL, et al. Large-scale integration of the plasma proteome with genetics and disease. Nat Genet. 2021;53:1712–1721. doi: 10.1038/s41588-021-00978-w. - DOI - PubMed
    1. Pietzner M, Wheeler E, Carrasco-Zanini J, Cortes A, Koprulu M, Wörheide MA, et al. Mapping the proteo-genomic convergence of human diseases. Science. 2021;374:eabj1541. doi: 10.1126/science.abj1541. - DOI - PMC - PubMed
    1. Võsa U, Claringbould A, Westra H-J, Bonder MJ, Deelen P, Zeng B, et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet. 2021;53:1300–1310. doi: 10.1038/s41588-021-00913-z. - DOI - PMC - PubMed

LinkOut - more resources