Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 15;32(4):490-6.
doi: 10.1093/bioinformatics/btv565. Epub 2015 Oct 17.

GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding

Affiliations

GERV: a statistical method for generative evaluation of regulatory variants for transcription factor binding

Haoyang Zeng et al. Bioinformatics. .

Abstract

Motivation: The majority of disease-associated variants identified in genome-wide association studies reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of genome-wide association studies.

Results: We present GERV (generative evaluation of regulatory variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer-based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor's canonical motif and associated co-factor motifs. We show that GERV outperforms existing methods in predicting single-nucleotide polymorphisms associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked single-nucleotide polymorphisms and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis.

Availability and implementation: The implementation of GERV and related data are available at http://gerv.csail.mit.edu/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The schematic of GERV. The spatial effects of all the k-mers and the DNase-seq covariates are learned from the reference genome sequence and ChIP-seq, DNase-seq datasets. Then the spatial effects (purple, cyan andgreen) of the k-mers underlying the reference (blue) and alternate (red) allele for a variant are aggregated with DNase-seq covariates by log-linear combination to yield a spatial prediction of localChIP-seq reads for the two alleles. GERV scores the variant by the l2-norm of the predicted change of reads
Fig. 2.
Fig. 2.
(A) Example held-out genomic region on chromosome 14 showing GERV-predicted NF-κB reads (black), actual NF-κB ChIP-seq reads (red) and rabbit IgG control ChIP-seq reads (green). (B) Comparison of GERV-predicted (x-axis) and observed (y-axis) NF-κB ChIP-seq reads in binned regions of held-out chromosomes 14–22. The coefficient and r2 of a linear regression on predicted and actual z-score is plotted. (C) ROC curve for discriminating NF-κB peaks from negative control sets using GERV and gapped-kmer SVM (gkmSVM)
Fig. 3.
Fig. 3.
ROC curve (first row) and PRC (second row) for discriminating ASB SNPs from the second type of negative variant set (10 times of the size of positive set) using GERV (red), GERV without covariates (yellow), deltaSVM (blue) and sTRAP (green). Gray-dashed line in ROC curves indicates random chance. In each figure, 95% confidence intervals of the true-positive rate (for ROC) or precision (for PRC) are plotted. The performance of sTRAP on JUND is not measurable as JUND motif is not included in its built-in motif database
Fig. 4.
Fig. 4.
(A) GERV correctly predicted the effect of validated causal SNP rs4784227 on FOXA1 binding, while deltaSVM failed. (B) The 29 variants previously reported to modulate FOXA1 binding had significantly higher (Mann–Whitney U test P = 0.0011) GERV scores than the rest of the AVS

Similar articles

Cited by

References

    1. Andersen M.C., et al. (2008) In silico detection of sequence variations modifying transcriptional regulation. PLoS Comput. Biol., 4, e5. - PMC - PubMed
    1. Bartels M., et al. (2007) Peptide-mediated disruption of NFkappaB/NRF interaction inhibits IL-8 gene activation by IL-1 or Helicobacter pylori. J. Immunol., 179, 7605–7613. - PubMed
    1. Carroll J.S., et al. (2005) Chromosome-wide mapping of estrogen receptor binding reveals long-range regulation requiring the forkhead protein FoxA1. Cell, 122, 33–43. - PubMed
    1. Carroll J.S., et al. (2006) Genome-wide analysis of estrogen receptor binding sites. Nat. Genet., 38, 1289–1297. - PubMed
    1. Cowper-Sal Lari R., et al. (2012) Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat. Genet., 44, 1191–1198. - PMC - PubMed

Publication types

Substances