Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Oct 9:2023.10.06.561245.
doi: 10.1101/2023.10.06.561245.

Allele biased transcription factor binding across human brain regions gives mechanistic insight into eQTLs

Affiliations

Allele biased transcription factor binding across human brain regions gives mechanistic insight into eQTLs

Belle A Moyers et al. bioRxiv. .

Update in

Abstract

Transcription Factors (TFs) influence gene expression by facilitating or disrupting the formation of transcription initiation machinery at particular genomic loci. Because genomic localization of TFs is in part driven by TF recognition of DNA sequence, variation in TF binding sites can disrupt TF-DNA associations and affect gene regulation. To identify variants that impact TF binding in human brain tissues, we quantified allele bias for 93 TFs analyzed with ChIP-seq experiments of multiple structural brain regions from two donors. Using graph genomes constructed from phased genomic sequence data, we compared ChIP-seq signal between alleles at heterozygous variants within each tissue sample from each donor. Comparison of results from different brain regions within donors and the same regions between donors provided measures of allele bias reproducibility. We identified thousands of DNA variants that show reproducible bias in ChIP-seq for at least one TF. We found that alleles that are rarer in the general population were more likely than common alleles to exhibit large biases, and more frequently led to reduced TF binding. Combining ChIP-seq with RNA-seq, we identified TF-allele interaction biases with RNA bias in a phased allele linked to 6,709 eQTL variants identified in GTEx data, 3,309 of which were found in neural contexts. Our results provide insights into the effects of both common and rare variation on gene regulation in the brain. These findings can facilitate mechanistic understanding of cis-regulatory variation associated with biological traits, including disease.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Personalized graph genomes improve read mapping for detection of allele-biased binding. A. Workflow for detection of allele biased binding. Whole Genome Sequencing and ChIP-seq of 93 TFs, POLR2, and 5 histone marks were performed in post-mortem brain samples from 2 donors. ChIP-seq reads were mapped to personalized graph genomes to identify allele bias and were compared within and across donors. B. Personalized genomes reduce problems of reference allele bias, increasing confidence in allele-biased binding detection. Density plots are shown for the reference allele frequency (x-axis) of significant (p<=0.05, binomial test) allele bias when using bowtie aligned to the linear reference (red) compared to using the vg toolkit aligned to a personalized graph genome (blue). Allele bias is more balanced between the reference and alternate for personalized graph genomes. C. There is significant disagreement in the number and identity of variants found preferring the reference and alternate alleles between methods. Heatmap showing the number of TF-biased allele interactions found nonsignificant, significant for reference, and significant for alternate by bowtie and vg.
Figure 2.
Figure 2.
Reproducibility and concordance of TF-allele bias within and between donors. A. Between-donor reproducibility. The fraction of TF-allele bias cases which were reproducible in the comparable TF-allele interaction in the same tissue across donors (y-axis) as a function of the minimum p-value cutoff used for significance (x-axis). Reproducibility was defined as 1 – 2*(Percent of inconsistent directional effects identified). B. Within-donor reproducibility. The fraction of TF-allele bias cases which were reproducible when comparing the same TF-allele interaction across different tissue contexts within the same donor. Reproducibility was defined as in 2A. C. Correlation of −log(p-value) of effects of a variant across tissues, for variants with a pvalue of <=0.001 in at least one tissue for factors with ChIP-seq datasets in all 9 tissues. Bottom shows dot-plots of variant effects. Top shows correlation coefficients (Pearson) between each tissue. Diagonal line notes each tissue. Abbreviations denote: dorsolateral prefrontal cortex (DLPFC), frontal pole (FP), occipital lobe (OL), cerebellum (CB), anterior cingulate (AnCg), subgenual cingulate (SCg), dorsomedial prefrontal cortex (DMPFC), amygdala (Amy), and hippocampus (HC).
Figure 3.
Figure 3.
Genetic and genomic properties of variants displaying TF-allele bias. A. Stacked barplots showing the fraction of regions which have overlap with a particular cCRE type for all variant haplotypes (first from top), all TF peaks (second from top), haplotypes found significant for at least one TF (second from bottom), and haplotypes found significant in at least one TF while also overlapping with a TF peak (bottom) (y-axis). Cumulative fraction is shown on the x-axis. Barplots are colored by cCRE type as PLS (promoter-like signal): red, pELS (proximal enhancer-like signal) orange, dELS (distal enhancer-like signal) yellow, CA-CTCF (chromatin-accessible CTCF signal) pink, CA-TF (chromatin-accessible, TF signal) blue, TF (TF signal) blue-green, CA (chromatin-accessible) green, and with non-cCRE regions plotted in grey. B. Variants which are either very rare or very common in the population show highly significant allele bias. For varying ranges of derived allele frequency (x-axis), we show the fraction of significant variants which were found at or below a given significance threashold (y-axis). C. For very low-frequency derived alleles, a volcano-like plot is shown which relates the ChIP-seq preference for the ancestral allele (x-axis, log(ancestral ChIP-seq reads+1 / derived ChIP-seq reads+1) and the significance (y-axis, −log10(pvalue) as determined by a binomial test) for each significantly-biased variant. Points are colored by their derived allele frequency, with rarer derived alleles being black and more common, up to DAF=0.001, being plotted in yellow. For very rare alleles, there is a stronger preference for the ancestral allele, and the significance of bias is higher. D. For variants which weaken or strengthen a JASPAR motif (i.e. a motif was found in each sequence, but the score changed) for one of our assays TFs, the difference in FIMO score between the ancestral and derived allele (y-axis) versus the log(ancestralReads/derivedReads) for the relevant TF. Spearman’s Rho = 0.658, p<=2. X 10™16.
Figure 4.
Figure 4.
Allele-biased binding is consistent with and offers insight to the mechanisms of GTEx eQTLs. A. For GTEx eQTLs found in a neural context present in our data with significantly-biased ChIP-seq signal and phased significantly-biased RNA-seq reads in the appropriate genic region, a violin plot showing the distribution of log(RNA bias) (y-axis) versus the binned GTEx eQTL slope (x-axis). Spearman’s Rho 0.43, p<=2.2 x 10™16. B. Genomic track for the RPS14 gene showing the location of the GTEx eQTL chr5_150449748_G_A_b38 in the promoter. Green genes represent presence on the reverse strand, blue genes represent presence on the forward strand. Asterisk denotes the position of the eQTL. Tick marks denote heterozygous variants in the same phase as our heterozygous eQTL. C. Stacked barplots showing the fraction of reads supporting the reference or alternate strand (y-axis) of the eQTL for RNA (left) or ChIP-seq reads for biased TFs (right). D. Sequence of DNA surrounding the eQTL in B for the reference (top) and alternate (bottom) alleles, with the eQTL variant highlighted in red. Between them is displayed the MAZ motif MA1522.1 found in JASPAR, highlighting the alternate allele’s destruction of the canonical motif.
Figure 5.
Figure 5.
Allele-biased binding allows for fine-mapping of eQTL variants. A. Genomic track showing the region surrounding the GTEx eQTL chr22_32474782_C_T_b38, found in heterozygous form in both donors. Green genes represent presence on the reverse strand, blue genes represent presence on the forward strand. Asterisk denotes the position of the eQTL. Tick marks denote heterozygous variants in the same phase as our heterozygous eQTL. Loops from Hi-C data in iCell GlutaNeurons are shown above the gene tracks, noting 3D interactions. B. Left: Barplots depicting the fraction of reads supporting the strand of the reference (blue) or alternate (red) strand with regard to the eQTL for donor 1 and donor 2 for each of the FBXO7 or SYN3 gene. Right: Barplots depicting the fraction of reads mapping to the reference or alternate allele of the eQTL for significantly biased cohesion factors in donor 1 and donor 2. C. Sequence of DNA surrounding the eQTL in A for the reference (top) and alternate (bottom) alleles, with the eQTL variant highlighted in red. Between them is displayed the CTCF motif MA0139.1 found in JASPAR, highlighting the variant site.

References

    1. Abramov S, Boytsov A, Bykova D, Penzar DD, Yevshin I, Kolmykov SK, Fridman MV, Favorov AV, Vorontsov IE, Baulin E, et al. 2021. Landscape of allele-specific transcription factor binding in the human genome. Nat Commun 12: 2751. - PMC - PubMed
    1. Andersen BB, Korbo L, Pakkenberg B. 1992. A quantitative study of the human cerebellum with unbiased stereological techniques. J Comp Neurol 326: 549–560. - PubMed
    1. Bailey TL, Johnson J, Grant CE, Noble WS. 2015. The MEME Suite. Nucleic Acids Res 43: W39–49. - PMC - PubMed
    1. Behrens S, Vingron M. 2010. Studying the evolution of promoter sequences: a waiting time problem. J Comput Biol 17: 1591–1606. - PMC - PubMed
    1. Bioconductor Core Team BPMO [Cre. 2017. TxDb.Hsapiens.UCSC.hg38.knownGene. https://bioconductor.org/packages/TxDb.Hsapiens.UCSC.hg38.knownGene (Accessed August 23, 2023).

Publication types